From Microsoft Announces Availability of Open and Royalty-Free License For Office 2003 XML Reference Schemas

Microsoft Corp. today announced the availability of a royalty-free licensing program for its Microsoft® Office 2003 XML Reference Schemas and accompanying documentation. ... Microsoft's new Office 2003 versions of Word, Excel and the InfoPath (TM) information-gathering program utilize schemas that describe how information is stored when documents are saved as XML....

To ensure broad availability and access, Microsoft is offering the royalty-free license using XML Schema Definitions (XSDs), the cross-industry standard developed by the W3C. The license provides access to the schemas and full documentation to interested parties and is designed for ease of use and adoption. The Microsoft Office 2003 XML Reference Schemas include WordprocessingML (Microsoft Office Word 2003), SpreadsheetML (Microsoft Office Excel 2003) and FormTemplate XML schemas (Microsoft Office InfoPath 2003).

The biggest gripe when Office 2003's XML support was announced was that the schemas for WordprocessingML (aka WordML) and co. were proprietary. This was reported in a number of fora including Slashdot & C|Net news. I wonder how many will carry the announcements that these schemas are available for all to peruse and reuse in a royalty free manner?

Update: On C|Net news: Microsoft pries open Office 2003

Update2: On Slashdot: Microsoft Word Document ML Schemas Published


 

Categories: XML

November 17, 2003
@ 06:32 AM

George Mladenov asked

Why does XAML need to be (well-formed) XML in the first place?

To which Rob Relyea responds with the following reasons

1.      Without extra work from the vendors involved, we’d like all XML editors be able to work with XAML.

2.      We’d like transformations (XSLT, other) be able to move content to/from XAML.

3.      We didn’t want to write our own file parsing code, the parser code we do have is built on top of System.XML.XmlTextReader.  We are able to focus on our value add.

Thus it looks like XAML's use of XML passes the XML Litmus Test, specifically

Using XML for a software development project buys you two things (a) the ability to interoperate better with others and (b) a number of off-the-shelf tools for dealing with format. If neither of these things apply to a given situation then it doesn't make much sense to use XML.

However there are tradeoffs to using XML, some of which Rob points out. They are listed below with some of my opinions

1.      We want to enable setting the Background property on a Button (for example) in one of two ways:

a.       Using a normal attribute - <Button Background=”Red”>Click Here</Button>

b.      Using our compound property syntax –

...

c.       Ideally if somebody tried to use both syntaxes at the same time, we could error.  XML Schema – as far as I am aware – isn’t well equipped to describe that behavior.

 

Being the PM for W3C XML Schema technologies in the .NET Framework means I get to see variations of this request regularly. This feature is typically called co-occurence constraints and is lacking in W3C XML Schema but is supported by other XML schema languages like RELAX NG and can be added to W3C XML Schema using Schematron annotations. Given the existing complexity of W3C XML Schema's conflicting design goals (validation language vs. type system) and contradictory rules I for one am glad this feature doesn't exist in the language.

However this means that users who want to describe their schemas using W3C XML Schema need to face the fact that not all the constraints of their vocabulary can be expressed in a schema which is always the case it's just that some constraints seem significant enough to be in the schema while others are OK being checked in code during "business logic processing". In such cases there are basically 3 choices (i) try to come as close as possible to describing the content model in the schema which sometimes may lead to what us language lawyers like to call "gross hacks" (ii) use an alternate XML schema language or extend the W3C XML Schema language in some way or (iii) live with the fact that some constraints won't be describable in the schema.

It is a point of note that althogh the W3C XML Schema recommendation contains what seems like a schema for Schema (sForS) (i.e. the rules of W3C XML Schema are themselves described as a schema) this is in fact not the case. The schema in the spec, although normative is invalid and even if it was valid still does not come close to rigidly specifying all the rules of W3C XML Schema. The way I look at it is simple, if the W3C XML Schema working group couldn't come up with a way to fully describe an XML vocabulary using XML Schema then the average vocabulary designer shouldn't be bothered if they can't either.

2.      It is a bit strange, for designers or developers moving from HTML to XML.  HTML is forgiving.  XML isn’t.  Should we shy away from XML so that people don't have to quotes around things?  I think not.

Having to put quotes around everything isn't the biggest pain in the transition from HTML to XML, and after a while it comes naturally. A bigger pain is dealing with ensuring that nested tags are properly closed and I'm glad I found James Clark's nxml-mode for Emacs which has helped a lot with this. The XML Editor in the next version of Visual Studio should also be similarly helpful in this regard.

The lack of the HTML predefined entities is also a bit of culture shock when moving to XML from HTML, and one some consider a serious bug with XML, I tend to disagree.

3.      It is difficult to keep XAML as a human readable/writable markup, as that isn’t one of XML’s goals.  I think it needs to be one of XAML’s goals.  It is a continual balancing act.

Actually one of the main goals of XML is to be human-readable, at least as human readable as HTML was since it was intended to replace HTML in the beginning. There's a quick history lesson in my SGML on the Web: A Failed Dream? post from earlier last month.


 

Categories: XML

November 17, 2003
@ 04:30 AM

I just finished watching a TiVoed episode of Justice League where a character died in battle. The character had been a moderately recurring one who was given some depth in the preceding episode before being killed in the following one.  Coupled with Disney's Brother Bear in which  a major character that's just been introduced ends up dying and another whose significance we learn later dies as well it seems like death in children's cartoons is no longer taboo. 

I remember  watching cartoons like Voltron & Thunder Cats as a kid and thinking that the fact that the major characters were never at risk of death made rooting for the good guys or against the bad guys a waste of time. Of course, I was one of the kids who was deeply affected when Optimus Prime bought it in Transformers: The Movie. That death was a solitary event in the cartoon landscape which didn't lead to the start of a trend as I expected and itself was diluted by the fact that they kept bringing Optimus Prime back in one shape or form every other episode.

This trip down memory lane makes me feel nostalgia for old episodes of my favorite cartoons. Time to go bargain hunting on Amazon.

 

 


 

Categories: Ramblings

November 15, 2003
@ 11:31 PM

We finally got around to adding some screen shots to the RSS Bandit wiki.

For those who are curious, there should be another release in the next couple of weeks. This should be mostly a bug fix release with a number of improvements in responsiveness of the GUI. The only noticeable new features should be a new preferences tab for adding search engines to the ones available from the search bar, the ability to apply themes to feed items from the preferences dialog without having to exit the dialog and the ability to search RSS items on disk.  

Hopefully, if I can get some cooperation from a couple of folks there also may be some changes to the subscription harmonization functionality.


 

Categories: RSS Bandit

Robert Scoble writes

Microsoft has 55,000 employees. $50 billion or so in the bank.

Yet what has gotten me to use the Web less and less lately? RSS 2.0.

Seriously. I rarely use the browser anymore (except to post my weblog since I use Radio UserLand).

See the irony there? Dave Winer (who at minimum popularized RSS 2.0) has done more to get me to move away from the Web than a huge international corporation that's supposedly focused on killing the Web.

Diego Duval responds

Robert: the web is not the browser.

Robert says that he's "using the web less and less" because of RSS. He's completely, 100% wrong.

RSS is not anti-web, RSS is the web at its best.

The web is a complex system, an interconnection of open protocols that run on any operating system
...
Let me say it again. The web is not the browser. The web is protocols and formats. Presentation is almost a side-effect.

Both of them have limited visions of what actually constitutes the World Wide Web. The current draft of the W3C's Architecture of the World Wide Web gives a definition of the Web that is more consistent with reality and highlights the limitations of both Diego and Robert's opinions of what consititutes the WWW. The document currently states

The World Wide Web is an network-spanning information space consisting of resources, which are interconnected by links defined within that space. This information space is the basis of, and is shared by, a number of information systems. Within each of these systems, agents (e.g., browsers, servers, spiders, and proxies) a provide, retrieve, create, analyze, and reason about resources.

This contradicts Robert's opinion that the web is simply about HTML pages that you can view in a Web browser and it contradicts Diego's statements that the Web is about "open" protocols that run on "any" operating system. There are a number of technologies that populate the Web whose "open-ness" some may question, I know better than the cast stones when I live in a glass house but there are a few prominent examples that come to mind.  

The way I read it, the Web is about URIs that identify resources that can be retrieved using HTTP by user agents. In this case, I agree with Diego that RSS 2.0 is all about the Web. A news aggregator is simply a Web agent  that retrieves a particular Web resource (the RSS feed) at periodic intervals on behalf of the user using HTTP as the transfer protocol.


 

Categories: Ramblings

November 14, 2003
@ 04:22 PM

 Fumiaki Yoshimatsu writes

Why does someone still think that they have to write Unicode BOMs by themselves, digging deep inside XmlTextWriter.BaseStream and UnicodeEncoding.GetPreamble?  Encoding hint in the XML declarations and Unicode BOMs are all about XML 1.0 thing, but WriteStartElement and WriteStartDocument are not.  They are InfoSet thing, so they do not have anything to do with the serialization format.  Think about XmlNodeWriter for example.  Why does XmlNodeWriter NOT have any constructor that have a parameter of type Encoding?  Why does it always call XmlDocument.CreateXmlDeclaration with null as the second argument?

This is a common point of confusion for users of XML in the CLR. XmlNodeWriter doesn't have a parameter of type Encoding because it writes to an XmlDocument which is stored in memory and all strings in the CLR are in UTF-16 encoding. Setting the encoding only matters when saving the XmlDocument to a stream. As for having to dig into XmlTextWriter.BaseStream to set the encoding, I find this  weird considering that the XmlTextWriter constructor has a number of ways to specifying the encoding on instantiating an instance of the class. Since XML 1.0 mandates that an XML document can only have one encoding there is no reason for methods like WriteStartElement and WriteStartDocument to concern themselves with encoding issues.  

If you really want to dive deep into issues involving specifying the encoding of XML documents and the CLR take a look at  this discussion in Robert McLaws's weblog.

PS: One of my pet peeves is the way people misuse the term XML infoset to mean "things in XML I don't care about" even though there is a precise definitition (nay an entire spec) that describes what it means. The document information item clearly has a [character encoding scheme] property which means character encodings are an XML infoset thing.


 

Categories: XML

November 14, 2003
@ 05:11 AM

Irwando the Magnificent (king of SQLXML) just pointed me at iPocalypse Photoshop. A number of the pseudo-engravings are quite amusing, my favorites are "Stolen music is better than sex" and "once you've had small and white..."

The photoshopped iPods in the scenes from Eddie Murphy's Haunted Mansion are also worth a snicker or two.

 


 

It looks like its confirmed that I'll be attending XML 2003.

Should be fun.


 

Categories: Ramblings

Oleg Tkachenko writes

Just found new beast in the Longhorn SDK documentation - OPath language:

The OPath language is the query language used to query for objects using an ObjectSpace. The syntax of OPath also allows you to query for objects using standard object oriented syntax. OPath enables you to traverse object relationships in a query as you would with standard object oriented application code and includes several operators for complex value comparisons.

Orders[Freight > 5].Details.Quantity > 50 OPath expression should remind you something familiar. Object-oriented XPath cross-breeded with SQL? Hmm, xml-dev flamers would love it.

The approach seems to be exactly opposite to ObjectXPathNavigator's one - instead of representing object graphs in XPathNavigable form, brand new query language is invented to fit the data model. Actually that makes some sense, XPath as XML-oriented query language can't fit all. I wonder what Dare think about it. More studying is needed, but as for me (note I'm not DBMS-oriented guy though) it's too crude yet

Oleg is right that an XML oriented query language like XPath doesn't fit for querying objects. There is a definitely an impedance mismatch between XML and objects, a good number of which were pointed out by Erik Meijer in his paper Programming with Circles, Triangles and Rectangles. A significant number of constructs and semantics of XPath simply don't make sense in a language designed to query objects. The primary construct in XPath is the location step which consists of an axis, a node test and zero or more predicates, of which both the axis and the node test are out of place in an object query language.

From the XPath Grammar, there are 13 axes of which almost none make sense for objects besides self. They are listed below

[6]    AxisName    ::=    'ancestor'
| 'ancestor-or-self'
| 'attribute'
| 'child'
| 'descendant'
| 'descendant-or-self'
| 'following'
| 'following-sibling'
| 'namespace'
| 'parent'
| 'preceding'
| 'preceding-sibling'
| 'self'

The ones related to document order such as preceding, following, preceding-sibling and following-siblings don't really apply to objects since there is no concept of order amongst the properties and fields of a class. The attribute axis is similarly unrelated since there is no equivalent of the distinction between elements and attributes among the fields and properties of a class. 

The axes related to document hierarchy such as parent, child, ancestor, descendent, etc look like they may make sense to map to object oriented concepts until one asks what exactly is meant to be the parent of an object? Is it the base class or the object to which the current object belongs as a field or property? Most would respond that it is the latter. However what happens when multiple objects have the same object as a field which is often the case since objects structures are graph-like not tree-like as XML structures? It also gets tricky when an object that is a field in one class is a member of a collection in another class. Is the object a child of the collection? If so what is the parent of the object, if not what is the relationship of the object to the collection then? The questions can go on...

On the surface the namespace axes sounds like it could map to concepts from object oriented programming since languages like C#, C++ and Java all have a concept of a "namespace". However namespace nodes in the XPath data model have distinct characteristics (such as the fact that each element node in document has a distinct set of namespace nodes regardless of whether each of these namespace nodes represent the same mapping of a prefix to a namespace URI). 

A similar argument can also be made around node tests which are the second primary constructs in XPath location steps. A node test either specifies a name or a type of node to match. A number of XPath node types don't have equivelants in the object oriented world such as comment and processing instruction nodes. Other nodes such as text and element nodes are problematic when one begins to try to tie them in to the various axes such as the parent axis.

Basically, a significant amount of XPath is not really applicable to querying objects without changing the semantics of certain aspects of the language in a way that conflicts with how XPath is used when querying XML documents.

As for how this compares to my advocacy of XML to object mapping techniques such as the ObjectXPathNavigator, the answer is simple; XML is the universal data interchange format and the software world is moving to a situation where all the major sources of important data can be accessed or viewed as XML from office documents to network messages to information locked within databases. It makes sense then that in creating this universal data access layer that one create a way for all interesting sources of data to be viewed as XML so they to can participate as input for data aggregation technologies such as XSLT or XQuery and enable the reuse of XML technologies for processing and manipulating them.


 

Categories: Life in the B0rg Cube | XML

November 11, 2003
@ 11:10 PM

I noticed the followingRDF Interest Group IRC chat log discussing my recent post More on RDF, The Semantic Web and Perpetual Motion Machines in my referrer logs. I found the following excerpts quite illuminating

15:43:42 <f8dy> is owl rich enough to be able to say that my <pubDate>Tue, Nov 11, 2003</pubDate> is the same as your <dc:date>2003-11-11</dc:date>

15:44:35 <swh> shellac: I believe that XML datatypes are...

...

16:08:15 <f8dy> that vocabulary also uses dates, but it stores them in rfc822 format

16:08:51 <f8dy> 1. how do i programmatically determine this?

16:08:58 <JHendler> ah, but you cannot merge graphs on things without the same URI, unless you have some other way to do it

16:09:02 <f8dy> 2. how do i programmatically convert them to a format i understand?

...

16:09:40 <shellac> 1. use

...

16:10:13 <shellac> 1. use a xsd library

16:10:32 <shellac> 2. use an xsd library

...

16:11:08 <JHendler> n. use an xsd library :->

16:11:30 <shellac> the graph merge won't magically merge that information, true

16:11:34 <JHendler> F: one of my old advisors used to say the only thing better than a strong advocate is a weak critic

This argument cements my suspicions that the using RDF and Semantic Web technologies are a losing proposition when compared to using XML-centric technologies for information interchange on the World Wide Web. It is quite telling that none of the participants who tried to counter my arguments gave a cogent response besides "use an xsd library" when in fact anyone with a passing knowledge of XSD would inform them that XSD only supports ISO 8601 dates and would barf on RFC 822 if asked to treat them as dates. In fact, this is a common complaint about them from our customers w.r.t internationalization [that and the fact decimals use a period as a delimiter instead of a comma for fractional digits]. 

Even in this simple case of mapping equivalent elements (dc:date and pubDate) the Semantic Web advocates cannot provide a solution to how their vaunted ontolgies can provide a solution to a problem the average RSS aggregator author solves in about 5 minutes of coding using off-the-shelf XML tools. It is easy to say philosphically that dc:date and pubDate after all, they are both dates, but another thing to write code that knows how to treat them uniformly. I am quite surprised that such a straightforward real-world example cannot be handled by Semantic Web technologies. Clay Shirky's The Semantic Web, Syllogism, and Worldview makes even more sense now.

One of my co-workers recently called RDF an academic plaything, after seeing how many of its advocates ignore the difficult real world problems faced by software developers and computer users today while pretending that obtuse solution to trivial problems are important, I've definitely lost any interest I had left in investigating any further about the Semantic Web.


 

Categories: XML