Tim Bray has a post entitled Thought Experiments where he writes

To keep things short, let’s call OpenDocument Format 1.0 "ODF" and the Office 12 XML File Formats "O12X".

Alternatives · In ODF we have a format that’s already a stable OASIS standard and has multiple shipping implementations. In O12X we have a format that will become a stable ECMA standard with one shipping implementation sometime a year or two from now, depending on software-development and standards-process timetables. ODF is in the process of working its way through ISO, and O12X will apparently be sent down that road too, which should put ISO in an interesting situation.

On the technology side, the two formats are really more alike than they are different. But, there are differences: O12X's design center, Microsoft has said repeatedly, is capturing the exact semantics of the billions of existing Microsoft Office documents. ODF’s design center is general-purpose reusability, and leveraging existing standards like SVG and MathML and so on.

Which do you like better? I know which one I’d pick. But I think we’re missing the point.

Why Are There Two? · Almost all office documents are just paragraphs of text, with some bold and some italics and some lists and some tables and some pictures. Almost all spreadsheets are numbers and labels, with some sums and averages and pivots and simple algebra. Almost all presentations are lists of bullet points with occasional pictures.

The capabilities of ODF and O12X are essentially identical for all this basic stuff. So why in the flaming hell does the world need two incompatible formats to express it? The answer, obviously, is, "it doesn’t".

I find it extremely ironic that one of the driving forces behind creating a redundant and duplicative XML format for website syndication would be one of the first to claim that we only need one XML format to solve any problem. For those who aren't in the know, Tim Bray is one of the chairs of the Atom Working Group in the IETF whose primary goal is to create a competing format to RSS 2.0 which does basically the same thing. In fact Tim Bray has written a decent number of posts attempting to explain why we need multiple XML formats for syndicating blog posts, news and enclosures on the Web.

But let's ignore the messenger and focus on the message. Tim Bray's question is quite fair and in fact he answers it later on in his blog entry. As Tim Bray writes, "Microsoft wants there to be an office-document XML format that covers their billions of legacy documents". That's it in a nutshell. Microsoft created XML versions of its binary document formats like .doc and .xls that had full fidelity with the features of these formats. That way a user can convert a 'legacy' binary Office document to a more interoperable Office XML document without worrying about losing data, formatting or embedded rich media. This is a very important goal for the Microsoft Office team and very different from the goal of the designers of the OpenDocument format. 

Is it technically possible to create a 'common shared office-XML dialect for the basics' as Tim Bray suggests? It is. It'll probably take several years (e.g. the Atom syndication format which is simply a derivative of RSS has taken over two years to come to fruition) and once it is done, Microsoft will have to 'embrace and extend' it to meet its primary goal of 100% backwards compatibility with its legacy formats. And that doesn't answer the question of what Microsoft should ship in the meantime with regards to file formats in its Office products. After all, Office 12 is scheduled to ship in the second half of 2006.

There is no simple technical solution on the horizon that will change the fact that there are be multiple XML formats for Office documents. What we need to agree on is the best way forward, not attempt to demonize each other for trying to do what's best for our customers.

Disclaimer: I work at Microsoft. However I do not work in any area related to the Office XML formats. The above is my personal opinion and should not be construed as an expression of the opinions, intents or strategies of my employer.


Categories: XML
