Thursday, 20 November 2003 - Dare Obasanjo's weblog

November 20, 2003

@ 09:39 PM

Misinformation on XML.com: Microsoft and Binary XML

In his recent article entitled Binary Killed the XML Star? Kendall Clark writes

Many XML proponents and users came out of various binary exchange and format camps, and they are very unwilling to return to what were for them, or so it would seem, dark days. In this case, however, given the real power of those who most seem to want a binary variant -- including Sun, IBM, and Microsoft -- they may have to adopt a carefully tactical plan to limit the damage, rather than preventing the fight completely.

This claim by Kendall Clark seems to contradict the conclusions in the postion papers provided by both Microsoft and IBM at the The W3C Workshop on Binary Interchange of XML Information Item Sets.

IBM's position paper concludes with

IBM believes that wherever possible, implementations of the existing XML 1.x Recommendation should be optimized to meet the needs of customers. While we expect to see non-standard binary forms used internally within certain vendors’ implementations, including perhaps our own, we are not yet convinced that there is justification to standardize an interchange format other than XML 1.x. We thus believe that it would be premature for the W3C to launch a formal workgroup, or to recharter an existing group, to develop a Binary XML Recommendation

Microsoft's position paper concludes with

For different classes of applications, the criterion (minimize footprint or minimize parse/generate time) for the binary representation is different and often conflicting. There is no single criterion that optimizes all applications. Consequently, a binary standard could result in a suite of allowable representations that clients and servers must be prepared to receive and process. This is a retrograde step from the portability goals of XML 1.0. Furthermore, the optimal binary representation depends on the machine and OS architectures on each end — translating between binary representations negates much of the advantages that binary XML has over text.

Besides the position paper from Microsoft there've have been many comments both in Weblogs and mailing lists from Microsoft people against this movement for a standardized binary XML format (oxymoron that it is). There have been weblog posts by myself, Joshua Allen and Omri Gazitt (all of whom work on XML technologies at Microsoft) decrying the movement towards binary XML and thus potential fragmentation of the XML world.

There have also been a number of posts by Microsoft employees against standardized binary XML on mailing list such as XML-DEV some of which have been quoted on Elliotte Rusty Harold's Cafe con Leche XML News website

I fear that splitting the interop story of XML into a textual and Infoset-based/binary representation, we are going to get the "divide and conquer" effect that in the end will make XML just another ASN.1: a niche model that does not deliver the interop it promises and we will be back to lock-in.

--Michael Rys on the xml-dev mailing list, Tue, 18 Nov 2003

XML has succeeded in large part because it is text and because it is perceived as "the obvious choice" to many people. The world was a lot different before XML came around, when people had to choose between a dizzying array of binary and text syntaxes (including ASN.1). Anyone who tries to complicate and fragment this serendipitous development is, IMO, insane.

--Joshua Allen on the xml-dev mailing list, Tue, 18 Nov 2003

Unfortunately, it seems that Kendall Clark must have missed the various discussions, weblog posts and the position paper where Microsoft's view of the importance of textual XML 1.0 were put forth.

Categories: XML

November 18, 2003

@ 08:46 PM

Comments [5]

Confusing Hype with Reality

Robert Scoble writes

Rob Fahrni answered back and said "Scoble's on one of the best teams inside Microsoft." I've landed on a good one, yes, but I totally disagree that it's the best. I've seen tons of teams that are doing interesting things. By the way, he says Visio is a failure? Well, does the Visio team have any webloggers? Does it have an RSS feed? How are you supposed to sell software if you don't have a relationship with your customers?

On the surface it reads like Robert Scoble is claiming that if you don't have a blogger on your team nor an RSS feed then you don't have a relationship with your customers. This is probably the funniest thing I've seen all week.

Scoble's post reminds me of the Cult of the Cluetrain Manifesto article by John Dvorak. It's always unfortunate when people take a bunch of decent ideas and turned them into near-religious beliefs. Being in touch with your customers in an informal and accessible manner is nice but it isn't the only way to communicate with your customers nor is it necessary to make you successful.

I love my iPod. I love my TiVo. I love my Infiniti G35. I love Mike's Hard Lemonade and Bacardi O³. None of these products have official webloggers that I'm aware of nor do they have an RSS feed for their websites that I'm subscribed to. Furthermore, if competing products did it wouldn't change the fact that I'd still be all over the my iPod/TiVo/G35/etc.

Blogging and RSS feeds are nice, but they are the icing on the cake of interacting with and satisfying your customer needs not the end all and be all of them.

Categories: Ramblings

November 18, 2003

@ 04:57 PM

Comments [0]

XQuery is a Better XPath not a Better XSLT

Elliotte Rusty Harold writes

In XSLT 1.0 all output is XML. A transformation creates a result tree, which can always be serialized as either an XML document or a well-formed document fragment. In XSLT 2.0 and XQuery the output is not a result tree. Rather, it is a sequence. This sequence may contain XML; but it can also contain atomic values such as ints, doubles, gYears, dates, hexBinaries, and more; and there's no obvious or unique serialization for these things. For instance, what exactly are you supposed to do with an XQuery that generates a sequence containing a date, a document node, an int, and a parentless attribute? How do you serialize this construct? That a sequence has no particular connection to an XML document was very troubling to many attendees.

Looking at it now, I'm seeing that perhaps the flaw is in thinking of XQuery as like XSLT; that is, a tool to produce an XML document. It's not. It's a tool for producing collections of XML documents, XML nodes, and other non-XML things like ints. (I probably should have said it that way last night.) However, the specification does not define any concrete serialization or API for accessing and representing these non-XML collections. That's a pretty big hole left to implementers to fill.

The main benefits of XQuery are as a better way to retrieve data from one or more XML documents than previous methods (i.e. a better XPath) not as a way to transform one XML structure to another (i.e. XSLT). I assume that if Elliotte Rusty Harold isn't familiar with APIs that provided XPath as a standalone language such as the .NET Framework's XPathNavigator, the Oracle XDK, or Jaxen since all of these provide a way to get atomic values (number, string, or boolean) as well as nodes from querying an XML document.

Similrly, there is no well defined way to serialize the results of performing an arbitrary XPath on an XML document. The tough parts for implementers aren't atomic values or XML fragments as Elliotte Rusty Harrold describes both more mundane things like attribute values. For instance consider the following document

<test xmlns:e="http://www.example.com" e:id="1" />

queried using the following XPath expression

/*/@*[1]

which returns the first attribute of the document element. How would one serialize these results? There are a bunch of options such as

e:id="1"
{http://www.example.com}id="1"
@e:id="1"
{xmlns:e="http://www.example.com"}id="1"

All of which I could argue are valid serializations of the attribute node returned by that query. By the way, the .NET Framework uses the first serialization if one calls XmlNode.OuterXml on the XmlAttribute object returned by executing that query on an XmlDocument object.

So what's my point? That the situation Elliotte Rusty Harrold bemoans as being unique to XQuery has always existed with XPath. Even more, as Oleg Tkachenko points out there is an XSLT 2.0 and XQuery 1.0 Serialization draft recommendation which specifies how to serialize instances of the XPath 2.0/XQuery data model which even resolves the question about how one would serialize the results of the query above

It is a serialization error if an item in the sequence is an attribute node or a namespace node.

Short answer, you can't.

Categories: XML

November 18, 2003

@ 02:15 PM

Comments [1]

Confusing Features with Functionality

Mike Sanders says that businesses are clamoring for web based apps:

SEPTEMBER 29, 2003 ( INFOWORLD ) - Web applications rule the enterprise. That's the indisputable conclusion to be drawn from this year's InfoWorld Programming Survey. Despite directives from Microsoft Corp. and others that developers abandon server-based HTML applications for fat desktop clients, the ease of "zero deployment" through the browser continues to win the day.

Only a fool what count Microsoft out. But only a fool would ignore what businesses are proclaiming loudly from their desktops - we want more browser apps now

[via James Robertson]

One of the biggest problems I face as a Program Manager is that product teams often focus on features instead of functionality. Since Microsoft is very developer-centric it is very easy for us to focus on the implementation details of customer requests instead of focusing on their requirements and business cases. The job of a PM is to ensure that we focus on the latter instead if the former.

The InfoWorld article and the subsequent comment by Mike Sanders are examples of concentrating on features ([D]HTML applications) as opposed to functionality (zero deployment applications).

The primary message from the InfoWorld article isn't that users do not want rich client applications like Mike Sanders implies but that they'd rather have zero deployment than a rich client. The main lesson I take away from this isn't that users do not want rich client applications but that if one plans to provide a rich client solution then it should be a zero deployment solution as well*.

* In today's world, this typically means using Flash or Javascript.

Categories: Life in the B0rg Cube

November 17, 2003

@ 04:10 PM

Comments [4]

Open and Royalty-Free License For Office 2003 XML Reference Schemas

From Microsoft Announces Availability of Open and Royalty-Free License For Office 2003 XML Reference Schemas

Microsoft Corp. today announced the availability of a royalty-free licensing program for its Microsoft® Office 2003 XML Reference Schemas and accompanying documentation. ... Microsoft's new Office 2003 versions of Word, Excel and the InfoPath (TM) information-gathering program utilize schemas that describe how information is stored when documents are saved as XML....

To ensure broad availability and access, Microsoft is offering the royalty-free license using XML Schema Definitions (XSDs), the cross-industry standard developed by the W3C. The license provides access to the schemas and full documentation to interested parties and is designed for ease of use and adoption. The Microsoft Office 2003 XML Reference Schemas include WordprocessingML (Microsoft Office Word 2003), SpreadsheetML (Microsoft Office Excel 2003) and FormTemplate XML schemas (Microsoft Office InfoPath 2003).

The biggest gripe when Office 2003's XML support was announced was that the schemas for WordprocessingML (aka WordML) and co. were proprietary. This was reported in a number of fora including Slashdot & C|Net news. I wonder how many will carry the announcements that these schemas are available for all to peruse and reuse in a royalty free manner?

Update: On C|Net news: Microsoft pries open Office 2003

Update2: On Slashdot: Microsoft Word Document ML Schemas Published

Categories: XML

November 17, 2003

@ 06:32 AM

Comments [1]

XAML Passes the XML Litmus Test

George Mladenov asked

Why does XAML need to be (well-formed) XML in the first place?

To which Rob Relyea responds with the following reasons

1. Without extra work from the vendors involved, we’d like all XML editors be able to work with XAML.

2. We’d like transformations (XSLT, other) be able to move content to/from XAML.

3. We didn’t want to write our own file parsing code, the parser code we do have is built on top of System.XML.XmlTextReader. We are able to focus on our value add.

Thus it looks like XAML's use of XML passes the XML Litmus Test, specifically

Using XML for a software development project buys you two things (a) the ability to interoperate better with others and (b) a number of off-the-shelf tools for dealing with format. If neither of these things apply to a given situation then it doesn't make much sense to use XML.

However there are tradeoffs to using XML, some of which Rob points out. They are listed below with some of my opinions

1. We want to enable setting the Background property on a Button (for example) in one of two ways:

a. Using a normal attribute - <Button Background=”Red”>Click Here</Button>

b. Using our compound property syntax –

...

c. Ideally if somebody tried to use both syntaxes at the same time, we could error. XML Schema – as far as I am aware – isn’t well equipped to describe that behavior.

Being the PM for W3C XML Schema technologies in the .NET Framework means I get to see variations of this request regularly. This feature is typically called co-occurence constraints and is lacking in W3C XML Schema but is supported by other XML schema languages like RELAX NG and can be added to W3C XML Schema using Schematron annotations. Given the existing complexity of W3C XML Schema's conflicting design goals (validation language vs. type system) and contradictory rules I for one am glad this feature doesn't exist in the language.

However this means that users who want to describe their schemas using W3C XML Schema need to face the fact that not all the constraints of their vocabulary can be expressed in a schema which is always the case it's just that some constraints seem significant enough to be in the schema while others are OK being checked in code during "business logic processing". In such cases there are basically 3 choices (i) try to come as close as possible to describing the content model in the schema which sometimes may lead to what us language lawyers like to call "gross hacks" (ii) use an alternate XML schema language or extend the W3C XML Schema language in some way or (iii) live with the fact that some constraints won't be describable in the schema.

It is a point of note that althogh the W3C XML Schema recommendation contains what seems like a schema for Schema (sForS) (i.e. the rules of W3C XML Schema are themselves described as a schema) this is in fact not the case. The schema in the spec, although normative is invalid and even if it was valid still does not come close to rigidly specifying all the rules of W3C XML Schema. The way I look at it is simple, if the W3C XML Schema working group couldn't come up with a way to fully describe an XML vocabulary using XML Schema then the average vocabulary designer shouldn't be bothered if they can't either.

2. It is a bit strange, for designers or developers moving from HTML to XML. HTML is forgiving. XML isn’t. Should we shy away from XML so that people don't have to quotes around things? I think not.

Having to put quotes around everything isn't the biggest pain in the transition from HTML to XML, and after a while it comes naturally. A bigger pain is dealing with ensuring that nested tags are properly closed and I'm glad I found James Clark's nxml-mode for Emacs which has helped a lot with this. The XML Editor in the next version of Visual Studio should also be similarly helpful in this regard.

The lack of the HTML predefined entities is also a bit of culture shock when moving to XML from HTML, and one some consider a serious bug with XML, I tend to disagree.

3. It is difficult to keep XAML as a human readable/writable markup, as that isn’t one of XML’s goals. I think it needs to be one of XAML’s goals. It is a continual balancing act.

Actually one of the main goals of XML is to be human-readable, at least as human readable as HTML was since it was intended to replace HTML in the beginning. There's a quick history lesson in my SGML on the Web: A Failed Dream? post from earlier last month.

Categories: XML

November 17, 2003

@ 04:30 AM

Comments [4]

Death in Cartoons

I just finished watching a TiVoed episode of Justice League where a character died in battle. The character had been a moderately recurring one who was given some depth in the preceding episode before being killed in the following one. Coupled with Disney's Brother Bear in which a major character that's just been introduced ends up dying and another whose significance we learn later dies as well it seems like death in children's cartoons is no longer taboo.

I remember watching cartoons like Voltron & Thunder Cats as a kid and thinking that the fact that the major characters were never at risk of death made rooting for the good guys or against the bad guys a waste of time. Of course, I was one of the kids who was deeply affected when Optimus Prime bought it in Transformers: The Movie. That death was a solitary event in the cartoon landscape which didn't lead to the start of a trend as I expected and itself was diluted by the fact that they kept bringing Optimus Prime back in one shape or form every other episode.

This trip down memory lane makes me feel nostalgia for old episodes of my favorite cartoons. Time to go bargain hunting on Amazon.

Categories: Ramblings

November 15, 2003

@ 11:31 PM

Comments [0]

RSS Bandit Screenshots Galore

We finally got around to adding some screen shots to the RSS Bandit wiki.

For those who are curious, there should be another release in the next couple of weeks. This should be mostly a bug fix release with a number of improvements in responsiveness of the GUI. The only noticeable new features should be a new preferences tab for adding search engines to the ones available from the search bar, the ability to apply themes to feed items from the preferences dialog without having to exit the dialog and the ability to search RSS items on disk.

Hopefully, if I can get some cooperation from a couple of folks there also may be some changes to the subscription harmonization functionality.

Categories: RSS Bandit

November 14, 2003

@ 07:00 PM

Comments [1]

The Web is in the Eye of the Beholder

Robert Scoble writes

Microsoft has 55,000 employees. $50 billion or so in the bank.

Yet what has gotten me to use the Web less and less lately? RSS 2.0.

Seriously. I rarely use the browser anymore (except to post my weblog since I use Radio UserLand).

See the irony there? Dave Winer (who at minimum popularized RSS 2.0) has done more to get me to move away from the Web than a huge international corporation that's supposedly focused on killing the Web.

Diego Duval responds

Robert: the web is not the browser.

Robert says that he's "using the web less and less" because of RSS. He's completely, 100% wrong.

RSS is not anti-web, RSS is the web at its best.

The web is a complex system, an interconnection of open protocols that run on any operating system
...
Let me say it again. The web is not the browser. The web is protocols and formats. Presentation is almost a side-effect.

Both of them have limited visions of what actually constitutes the World Wide Web. The current draft of the W3C's Architecture of the World Wide Web gives a definition of the Web that is more consistent with reality and highlights the limitations of both Diego and Robert's opinions of what consititutes the WWW. The document currently states

The World Wide Web is an network-spanning information space consisting of resources, which are interconnected by links defined within that space. This information space is the basis of, and is shared by, a number of information systems. Within each of these systems, agents (e.g., browsers, servers, spiders, and proxies) a provide, retrieve, create, analyze, and reason about resources.

This contradicts Robert's opinion that the web is simply about HTML pages that you can view in a Web browser and it contradicts Diego's statements that the Web is about "open" protocols that run on "any" operating system. There are a number of technologies that populate the Web whose "open-ness" some may question, I know better than the cast stones when I live in a glass house but there are a few prominent examples that come to mind.

The way I read it, the Web is about URIs that identify resources that can be retrieved using HTTP by user agents. In this case, I agree with Diego that RSS 2.0 is all about the Web. A news aggregator is simply a Web agent that retrieves a particular Web resource (the RSS feed) at periodic intervals on behalf of the user using HTTP as the transfer protocol.

Categories: Ramblings

November 14, 2003

@ 04:22 PM

Comments [0]

Is XML About Text or Not?

Fumiaki Yoshimatsu writes

Why does someone still think that they have to write Unicode BOMs by themselves, digging deep inside XmlTextWriter.BaseStream and UnicodeEncoding.GetPreamble? Encoding hint in the XML declarations and Unicode BOMs are all about XML 1.0 thing, but WriteStartElement and WriteStartDocument are not. They are InfoSet thing, so they do not have anything to do with the serialization format. Think about XmlNodeWriter for example. Why does XmlNodeWriter NOT have any constructor that have a parameter of type Encoding? Why does it always call XmlDocument.CreateXmlDeclaration with null as the second argument?

This is a common point of confusion for users of XML in the CLR. XmlNodeWriter doesn't have a parameter of type Encoding because it writes to an XmlDocument which is stored in memory and all strings in the CLR are in UTF-16 encoding. Setting the encoding only matters when saving the XmlDocument to a stream. As for having to dig into XmlTextWriter.BaseStream to set the encoding, I find this weird considering that the XmlTextWriter constructor has a number of ways to specifying the encoding on instantiating an instance of the class. Since XML 1.0 mandates that an XML document can only have one encoding there is no reason for methods like WriteStartElement and WriteStartDocument to concern themselves with encoding issues.

If you really want to dive deep into issues involving specifying the encoding of XML documents and the CLR take a look at this discussion in Robert McLaws's weblog.

PS: One of my pet peeves is the way people misuse the term XML infoset to mean "things in XML I don't care about" even though there is a precise definitition (nay an entire spec) that describes what it means. The document information item clearly has a [character encoding scheme] property which means character encodings are an XML infoset thing.

Categories: XML

Dare Obasanjo's weblog

"You can buy cars but you can't buy respect in the hood" - Curtis Jackson

Navigation for Thursday, 20 November 2003 - Dare Obasanjo's weblog