Joe Wilcox has a post that has me scratching my head today. In his post Even More on New Office File Formats, he writes

Friday's eWeek story about Microsoft XML-based formats certainly raises some questions about how open they really are. Assuming reporter Pater Galli has his facts straight, Microsoft's formats license "is incompatible with the GNU General Public License and will thus prevent many free and open-source software projects from using the formats." Earlier this month, I raised different concerns about the new formats openness.

To reiterate a point I made a few weeks ago: Microsoft's new Office formats are not XML. The company may call them "Microsoft Office Open XML Fromats," but they are XML-based, which is nowhere the same as being XML or open, as has been widely misreported by many blogsites and news outlets.

There are two points I'd like to make here. The first is that "being GPL compatible" isn't a definition of 'open' that I've ever heard anyone make. It isn't even the definition of Open Source or Free Software (as in speech). Heck, even the GNU website has a long list of Open Source licenses that are incompatible with the GPL. You'll notice that this list includes the original BSD license, the Apache license, the Zope license, and the Mozilla public license. I doubt that EWeek will be writing articles about how Apache and Mozilla are not 'open' because they aren't GPL compatible.

Secondly, it's completely unclear to me what distinction Joe Wilcox is making between being XML and being XML-based. The Microsoft Office Open XML formats are XML formats. They are stored on the hard drive as compressed XML files using standard compression techniques that are widely available on most platforms. Compressing an XML file doesn't change the fact that it is XML. Reading his linked posts doesn't provide any insight into whether this is the distinction Joe Wilcox is making or whether there is another. Anyone have any ideas about this?



Monday, June 20, 2005 5:28:58 PM (GMT Daylight Time, UTC+01:00)
Looking at a previous post of his (, I believe his point is that the Office XML schema is complicated and uses a good deal of "jargon" to make it essentially a proprietary format, even though it is based on XML.

I can perhaps see that the XML generated might not be simple, but it's certainly XML, and that any moderately complicated document will require a certain amount of complexity to represent it in XML. It's still straight XML to me.
Monday, June 20, 2005 5:30:53 PM (GMT Daylight Time, UTC+01:00)
When Joe makes the distinction between being XML and XML-based, he does so because Microsoft's own marketing of the Office XML is fuzzy on the point, trying to imply that somehow Office XML is inherently open simply because it is expressed as XML.

The reality, of course, is that it is perfectly possible to define closed, proprietary formats in XML. Whether Office XML is truly open or not is something we'll have to wait and see on, but given Microsoft's track record on patent issues and the language now being used, some scepticism is warranted.
Andrew Shebanow
Monday, June 20, 2005 5:59:01 PM (GMT Daylight Time, UTC+01:00)

I recommend contacting Joe directly. He has responded politely and with due consideration to my direct email comments in the past. Use the email link at the bottom of each post, the one on the right side of the page won't work.

Tuesday, June 21, 2005 1:17:27 AM (GMT Daylight Time, UTC+01:00)
Hey Dare,
I just posted an example document up on my blog that shows a preview of what the new format will look like. It's pretty clear that it's just ZIP and XML. (

Also, for the people saying "we'll have to wait and see", you can already today view all the documentation for the Office 2003 schemas. There is a royalty free license that allows folks to use the formats without worrying about patents, and without owing MS any money. ( The Office 12 formats will follow the same license, and we'll have even more documentation available.

Tuesday, June 21, 2005 2:43:49 AM (GMT Daylight Time, UTC+01:00)
I realize it isn't couth to say, and certainly not from an employee (though for several it isn't no hinderance), but many tech reporters just like to put illogical and incorrect enhancements to any piece of news. Their tech yellow journalism is rather naseating.

Office 2003 has generated well formed and valid XML for quite some time now. I have no doubt that Office 2006 (guessing that'll be its name) will do the same. No product that Microsoft has ever made that claimed to support XML, that I'm aware of, did not in fact support real XML. Why should we expect that Office 2006 will be any different? Why should we expect the license to be any different than it is for the schemas of Office 2003? There is no reason that I can see, and certainly the authors of the chicken little articles about Microsoft vs. Open Source in the matter of Office document schemas have not brought up any reasons.

Sorry, I should vent on my own blog not yours...yours is just a little closer at the moment.
Comments are closed.