In a recent post entitled 15 Science Street Tim Bray, one of the inventors of XML, writes

Microsoft’s main talking point (I’m guessing here from the public documents) was that their software and format had the advantage that in WordML you can edit documents from arbitrary schemas.

Our pushback on that was that editing arbitrary-schema documents is damn hard and damn expensive and has never been anything more than a niche business.

which seems not to jibe with my experiences. Many businesses have XML formats specific to their target industry (LegalXML, HR-XML, FpML, etc) and many businesses use office productivity suites to create and edit documents. It seems very logical to expect that people would like to use their existing spreadsheet and word processing applications to edit their business documents instead of using XMl editors or specialized tools. More interestingly Tim Bray contradicts his position that editing user-defined schemas is a niche scenario when he writes

As we were winding up, a couple of really smart people (don’t know who they were) put up their hands and asked real good questions. The best was essentially “What would you like to see happen?” After some back and forth, I ended up with “You should have the right to own your own information. It’s your intellectual capital and you worked hard to produce it for your citizens. Sun doesn’t own it, Microsoft doesn’t own it, you own it, and that means it should be living in a nice, long-lived, non-proprietary data format that isn’t anyone’s competitive weapon.”

He took the words right out of my mouth. This is exactly what Microsoft has done with Office 2003 by allowing users to edit documents in XML formats of their choosing. In the letter Bringing the XML Vision to the Desktop with Office 2003 written by Jean Paoli of Microsoft (also a co-inventor of XML) he writes

an even greater and more innovative benefit is the fact that companies can now create their own XML schemas specific to their business, define the structure and type of data that each data element in a document contains and exchange information with customers and business partners more easily. This capability opens up a whole new realm of possibilities, not only for end users, but also for the business itself because now organizations can capture and reuse critical information that in the past has been lost or gone unused. 

Office 2003 is a great step forward in enabling businesses and end users harness the power of XML in typical document interchange scenarios. Arguments about whether you should use Sun's XML format or Microsoft's XML format aren't the point. The point is which tools allow you to use your XML format with the most ease.




Saturday, June 12, 2004 8:34:15 PM (GMT Daylight Time, UTC+01:00)
I beg to differ, Dare. The question for those elite few with custom business applications of XML schemas may well be "which tools allow you to use your XML format with the most ease" but for the millions who just want an office productivity suite to give them a productive office the question is actually "which tools allow you to use a standard XML format that leaves you free to choose over time so you don't have to use Danny's plain text files". But confusing the two questions is neat marketing, I agree.
Sunday, June 13, 2004 9:25:47 AM (GMT Daylight Time, UTC+01:00)
Hmm, Simon I think the situation is more like:

1. The general population that uses an OPS does not care about XML. They care about being able to get their job done.

2. The majority of the population that cares about that their OPS uses XML actually care about being able to add and use their own XML markup since this allows them to add their own semantics and enables easier interop, repurposing, aggregation and abstraction of their documents.

3. There is a small minority that cares about the XML vocabulary used by default by the OPS.

If 3 would be so important, formats such as RTF and LaTeX would have achieved much more adoption.

But then, our interpretation of the general public may well be shaped by our own projections...
Thursday, June 17, 2004 2:06:43 AM (GMT Daylight Time, UTC+01:00)
I think we agree actually Michael! Everyday OPS users don't care if it's XML at all. They /do/ care whether their documents are readable by everyone they need to exchange with and that their documents still work in 10 years time. XML is the tool by which this concern of theirs can be solved (by agreeing a single, standard, extensible-but-extensive format with as few optionals as possible) and what I believe our market is demanding is that we all (your company and mine in particular) get our act together and agree that format.

Being XML gives those customers (not 2. or 3.) the added benefit that they can readily "get inside" the format and build hacks to extend the longevity of the documents without use of APIs or macros, thus avoiding the dual problems of corporate altzheimers and loss of platform support over time for tools.

Ignoring the needs of group 1. (which is huge) to serve group 2. seems to me to be wrong. I know the OASIS work group would love to have some Microsoft participants - will some join in to accept the advice the EU has given and meet this huge customer need?
Friday, June 18, 2004 5:24:19 PM (GMT Daylight Time, UTC+01:00)
I don't know about the other XML formats mentioned, but LegalXML is not widely used in the legal field.

With respect to why RTF and LaTEX never caught on, RTF was always perceived as a Microsoft format, and LaTEX was for geeks.

On the other hand, garden variety XHTML when combined with CSS (and it's paged media tags, divisions, and class attributes) can handle anything you can throw at it, AND it is readable by normal people and familiar to a large and growing chunk of the population that maintain their own web sites or blogs. W3C-standard, non-"extended" XHTML is the way to go for documents.
Noah Frederick
Tuesday, June 22, 2004 2:16:56 AM (GMT Daylight Time, UTC+01:00)
The LegalXML eContracts TC does indeed need to be able to tag information specific to a contract (think parties, dates, dollar amounts, and possibly more abstract things like obligations). Indeed, we expect to be able to re-use vertical industry schemas (eg FpML for a contract which relates to derivatives).

The way we see the world, there are 2 independent layers. One is the structural layer (OpenOffice, WordML, XHTML 2, or other); the other is the semantic layer.

The semantic layer (parties, dates, dollar amounts etc) could be implemented using an arbitrary schema on top of WordML (but note US patent application 20040006744), or "standaside" (where the two are related together by XPath expressions a la XForms).

The structural layer is primarily of interest to IT people. End users care about the semantic layer. If you assume people will be authoring in Word, then WordML makes sense as the structural layer. If your users will be authoring in another tool, you need to worry about your choice of underlying structural layer.

[In the eContracts TC, we have determined that XHTML2 (in its current draft state) is not suitable for authoring, since it is too loose in that it allows PCDATA in a section element etc. Its a pity that you can't "prune" or tighten it, without falling outside the various defined conformance levels.]


Comments are closed.