XML 1.1: The W3C Gets It Wrong - Dare Obasanjo's weblog

February 6, 2004

@ 05:00 PM

A few days ago XML 1.1 became an official W3C recommendation. Mark Pilgrim, contrary to W3C guidelines, has celebrated by converting his RSS feed to XML 1.1 which means it currently cannot be processed by any Microsoft XML technologies from the XML parsers in the .NET Framework to MSXML which is used in a host of products from Internet Explorer to Office 2003.

This is the first step in fragmenting the interoperability on the Web gained by XML. It seems the next step will be W3C sanctioned binary XML. Anyway let's get back to XML 1.1. What exactly is wrong with it one might ask? The biggest thing wrong with it is that it is backwards incompatible with XML 1.0. A good summary of all the things you need to know about XML 1.1 is covered in Chapter 3 of Elliote Rusty Harrold's Effective XML

Everything you need to know about XML 1.1 can be summed up in two rules:

Don't use it.

(For experts only) If you speak Mongolian, Yi, Cambodian, Amharic, Dhivehi, Burmese or a very few other languages and you want to write your markup (not your text but your markup) in these languages, then you can set the version attribute of the XML declaration to 1.1. Otherwise, refer to rule 1.

XML 1.1 does several things, one of them marginally useful to a few developers, the rest actively harmful.

It expands the set of characters allowed as name characters

The C0 control characters (except for NUL) such as form feed, vertical tab, BEL, and DC1 through DC4 are now allowed in XML text provided they are escaped as character references.

C1 control characters (except for NEL) must now be escaped as character references

NEL can be used in XML documents, but is resolved to a line feed on parsing.

Parsers may (but do not have to) tell client applications that Unicode data was not normalized

Namespace prefixes can be undeclared

XML is a lousy format for most of the things it is used for. The one benefit it has is that it is widely supported and a guaranteed way to interoperate in a cross-platform manner. By tampering with this the W3C is effectively diluting one of the few benefits of using XML. This is an regrettable occurence. Unfortunately it looks like things will get worse now that the W3C also wants to dabble in “binary XML”.

Categories: XML

« rssbandit-users Mailing List Launched | Home | RSS Bandit v1.2.0.90 Released »

Friday, 06 February 2004 21:48:32 (GMT Standard Time, UTC+00:00)

You eluded to an unfavorable opinion of "binary XML," namely the W3C's involvement in such an initiative. I'd be curious to see a more detailed account of your stance, as I respect your opinions and have heard this viewpoint more frequently lately.

I could simply be naive - not the first time - but, why is W3C's dabbling in such a standard going to make things worse?

As long as we're dealing at the Infoset level, and assuming standard XML parsers understand binary formats (critical assumption), we're all happy. No? Some people care little about serialization format, specifically the size of the message and the resource-intensity of the parsing process, while others have constraints that preclude them from using XML 1.0/1.1 because they do care about such things. Either they're going to adopt XML (aka, Infoset-isms and all of the goodness that relies on the Infoset versus the serialization format), or they aren't. The lack of a definitive standard in this space at best forces them to shop elsewhere and at best encourages the proliferation of a number of non-standard, "proprietary" formats. The W3C currently has a dominant and respected position with XML that would nearly guarantee instant adoption (at least over the other "binary XML" standards that are out there).

XML serialization purity aside, if there is a market need for something, fighting it will merely force the would-be adopters to look elsewhere.

So, I would think we can agree on this: If somebody is going to create a de facto "binary XML" standard, the W3C is the best authority to do so. The question then becomes whether another standard serialization format is necessary and healthy for XML as a whole.

With regards to parsers understanding binary formats, yes this takes time. And yes, obviously existing parsers would barf if they were served up binary blobs instead of angle brackets and beauty. But is this not an essential link in the chain of XML's evolution, and essential to advancing the ubiquity of the XML platform?

Joe Duffy

Friday, 06 February 2004 22:12:26 (GMT Standard Time, UTC+00:00)

A good summary of my position [and that of Microsoft] on "binary XML" is available from reading http://www.25hoursaday.com/weblog/PermaLink.aspx?guid=a2065106-11b5-4239-824c-5dcc1b525415

Dare Obasanjo

Saturday, 07 February 2004 00:20:28 (GMT Standard Time, UTC+00:00)

The XML 1.1 specification states that "Programs which generate XML SHOULD generate XML 1.0". Not MUST. My actions are allowable.

XML 1.0-only parsers must reject documents which are not well-formed XML 1.0. One of the requirements of well-formedness in XML 1.0 is that, if the version attribute is present, it must be exactly "1.0".

Since you claim that you can't read my feed, you must be using a conforming XML parser. Doesn't that just suck for you?

Mark

Saturday, 07 February 2004 00:23:59 (GMT Standard Time, UTC+00:00)

However, in the spirit of optimal standards compliance, I have added a Unicode character to my feed which only exists in Unicode 4.0, thus requiring XML 1.1 and (I hope) satisfying your objection.

Mark

Sunday, 08 February 2004 16:17:29 (GMT Standard Time, UTC+00:00)

The only thing you said was "it isn't compatible with MS" and that it's fragmenting the market. Well that's probably because MS has gone off on it's own and introduced special "enhancements" making themselves like they've been doing since they started business.

Your title should be changed from "W3C fragments market" to "More bugs in MS".

ioconnor

Sunday, 08 February 2004 16:47:31 (GMT Standard Time, UTC+00:00)

ioconner,
XML 1.1 isn't backwards compatible with XML 1.0 which means ANY conforming XML 1.0 processor will not be able to process XML 1.1 documents. This includes XML parsers produced by Sun, IBM, Open Source developers as well as Microsoft. Jon Udell has a screenshot of an XML 1.1 document failing in Mozilla at http://weblog.infoworld.com/udell/2004/02/06.html#a911 or are you suggesting that Mozilla is a "buggy Microsoft product" as well? :)

Dare Obasanjo

Sunday, 08 February 2004 17:41:13 (GMT Standard Time, UTC+00:00)

This is another example of the "I'm taking my ball and going home" crowd trying to prove a point where one doesn't need to be proven. Why break stuff that is already working?

Randy H.

Sunday, 08 February 2004 17:54:46 (GMT Standard Time, UTC+00:00)

Dare: Thanks for pointing out the W3C's recent decision.

Mark Pilgrim: Thanks for reinforcing my opinion of you--you are the George Barris of Internet standards. You can do anything with the technology, and while interesting to look at and talk about, at the end of the day, it's not very useful.

Who is George Barris? -- http://www.barris.com/

Steve Kirks

Monday, 09 February 2004 07:11:12 (GMT Standard Time, UTC+00:00)

You mean Chuck Barris? Gong Show.

http://www.retrocrush.com/archive/uknowncomic/chuck.jpg

Mark Pilgrim running dog capitalist lackey.

Dien Pho Huc

Tuesday, 10 February 2004 09:25:14 (GMT Standard Time, UTC+00:00)

Dare, good article. I wrote something similar earlier last week at http://sqljunkies.com/WebLog/mrys/archive/2004/02/05/972.aspx

For Joe regarding binary XML issues, see also:
http://sqljunkies.com/WebLog/mrys/archive/2003/11/20/488.aspx

Best regards
Michael

Michael Rys

Tuesday, 10 February 2004 22:47:17 (GMT Standard Time, UTC+00:00)

He he, it's so much fun trolling you, Dare. The only reason I used XML 1.1 in my feed for like 24 hours was to get Sam's attention long enough to fix this bug in the feed validator: http://sourceforge.net/tracker/index.php?func=detail&aid=892178&group_id=99943&atid=626803 . Also because it's guaranteed to get Dave's pathetic attention for 10 seconds and make him look even dumber than usual, since he'll Pavlovianly link to anything that he thinks makes me look bad.

I mean really, couldn't you tell I was kidding when I linked to the "why not to use XML 1.1" article (which you quoted, BTW, thanks for the attribution there... oops, wait, I forgot, you never bothered to mention that I found that tidbit for you, never mind). You and Dave are such a perfect couple, it's hard to know who to feel more sorry for.

The sad thing is that you really did have a good point, buried (as usual) below a mountain of condescension. XML 1.1 as written serves no real purpose other than to fragment the web and pander to a few influential W3C members (but not, in this case, your employer). And if anyone starts using it in ernest, every client-side developer in the world is going to have to scramble to upgrade their XML parser. Not that you would know anything about the upgrade treadmill, being from Microsoft and all, but I can assure you that the rest of us are quite familiar with it and it has left a sour taste in our collective mouths more than once.

Mark

Wednesday, 11 February 2004 01:20:59 (GMT Standard Time, UTC+00:00)

Mark,
You truly are an amusing little troll. The main reason I bothered to use your feed as an example of the stupidity of XML 1.1 was because I got a bug report from an RSS Bandit user because it was erroring on your feed. As for attributing you for linking to Ellotte Rusty Harrold's book excerpt I'm quite sure I read it several months before I ever saw you link to it wherever you claim to have done so.

Dare Obasanjo

Thursday, 12 February 2004 12:27:26 (GMT Standard Time, UTC+00:00)

I'm probably missing something here, but why don't W3C just change a small word on XML 1.0 spec to make it "forward-compatible"? Then they can produce how many new XML flavours as they like.

And you two please stop, your behaviour is unbearable, you RSS/Atom guys comes out like the worst usenet trolls.

Giacomo

Comments are closed.

Dare Obasanjo's weblog

"You can buy cars but you can't buy respect in the hood" - Curtis Jackson

Navigation for XML 1.1: The W3C Gets It Wrong - Dare Obasanjo's weblog