Friday, 07 November 2003 - Dare Obasanjo's weblog

November 7, 2003

@ 03:23 PM

I've posted previously on why I think the recent outcry for the W3C to standardize on a binary format for the representation of XML information sets (aka "binary XML") is a bad idea which could cause significant damage to interoperability on the World Wide Web. Specifically I wrote

Binary XML Standard(s): Just Say No

Omri and Joshua have already posted the two main reasons why attempting to create a binary XML standard is folly (a) the various use cases and requirements are contradictory (small message size for low bandwidth situations and minimal parsing/serialization time for sitautions where minimizing processing time is prime) thus a single standard is unlikely to satisfy a large proportion of the requesters and (b) creation of a binary XML standard especially by an organization such as the W3C muddies the water with regards to interop, people already have to worry about the interop pain that will occur whenever XML 1.1 gets out of the door (which is why Elliotte Rusty Harold advises avoiding it like the plague) let alone adding one or more binary XML standards to the mix.

I just read the report from the W3C Workshop on Binary Interchange of XML Information Item Sets and I'm glad to see the W3C did not [completely] bow to pressure from certain parties to start work on a "binary XML" format. The following is the conclusion from the workshop

CONCLUSIONS

The Workshop concluded that the W3C should do further work in this area, but that the work should be of an investigative nature, gathering requirements and use cases, and prepare a cost/benefit analysis; only after such work could there be any consideration of whether it would be productive for W3C to attempt to define a format or method for non-textual interchange of XML.

See also Next Steps below for the conclusions as they were stated at the end of the Workshop.

This is new ground for the W3C. Usually W3C working groups are formed to take competing requirements from umpteen vendors and hash out a spec. Of course, the problems with this approach is that it doesn't scale. It may have worked for HTML when the competing requirements primarily came from two vendors but now that XML is so popular it doesn't work quite as well, as Tim Bray put it "any time there's a new initiative around XML, there are instantly 75 vendors who want to go on the working group".

It's good to see the W3C decide to take an exploratory approach instead of just forging ahead to create a spec that tries to satisfy myriad competing and contradictory requirements. They've done this before with W3C XML Schema (and to a lesser extent with XQuery) and the software industry is still having difficulty digesting the results. Hopefully at the end of their investigation they'll come to the right conclusions.

Categories: XML

November 7, 2003

@ 04:02 AM

Comments [0]

InfoWorld Shares The Link Love (scroll to the bottom)

The one where I find out why I'm getting so many referrals from InfoWorld's website.

Categories: Ramblings

November 5, 2003

@ 03:45 PM

Comments [16]

A Request for Authors of Weblog Software

One of the biggest concerns about RSS is the amount of bandwidth consumed by wasteful requests. Recently on an internal mailing list discussion there was a complaint about the amount of bandwidth wasted because weblog servers send a news aggregator an RSS feed containing items it has already seen. A typical news feed contains 10 - 15 news items where the oldest is a few weeks old and the newest is a few days old. A typical user has used their news aggregator to fetch the an RSS feed about once every other day. This means on average at least half the items in an RSS feed are redundant to people who are subscribed to the feed yet everyone (client & server) incurs bandwidth costs by having the redundant items appear in the feeds.

So how can this be solved? All the pieces to solve this puzzle are already on the table. Every news aggregator worth it's salt (NetNewsWire, SharpReader, NewsGator, RSS Bandit, FeedDemon, etc) uses HTTP Conditional GET requests. What does that mean in English? It means that most aggregators send information about when last they retrieved the RSS feed via the If-Modified-Since HTTP header and a the hashcode of the RSS feed provided by the server the last time it was fetched via the If-None-Match HTTP header. The interesting point is that although most news aggregators tell the server the last time they fetched the RSS feed almost no weblog server I am aware of actually uses this information to tailor the information sent back in the RSS feed. The weblog software I use is guilty of this as well.

If you fetched my RSS feed yesterday or the day before there is no reason for my weblog server to send you a 200K file containing five entries from last week which it currently does. Actually it is worse, currently my weblog software doesn't even perform the simple check of seeing whether there are any new items before choosing to send down a 200K file.

Currently this optimization is the one performed by weblog servers, if there are no new items then a HTTP 304 response is sent otherwise a feed containing the last n items is sent. A further optimization is possible where the server only sends down the last n items newer than the If-Modified-Since date sent by the client.

I'll ensure that this change makes it into the next release of dasBlog (the weblog software I use) and if you use weblog software I suggest requesting that your software vendor to do the same.

UPDATE: There is a problem with the above proposal in that it calls for a reinterpretation of how If-Modified-Since is currently used by most HTTP clients and directly violates the HTTP spec which states

b) If the variant has been modified since the If-Modified-Since
date, the response is exactly the same as for a normal GET.

The proposal is still valid except that this time instead of misusing the If-Modified-Since header I'd propose that clients and servers respect a new custom HTTP header such as "X-Feed-Items-New-Than" whose value would be a date in the same format as that used by the If-Modified-Since header.

Categories: XML

November 5, 2003

@ 02:51 AM

Comments [1]

Blogs Are About Conversations

The following screenshot reminded me of recent comments by Robert Scoble on blogs and conversational marketing

Categories: Ramblings

November 5, 2003

@ 02:29 AM

Comments [0]

How Does One Describe Universal Types?

In response to my earlier post on his "Replace & Defend" theory Jon Udell writes

We have yet to even scratch the surface of what's possible given these circumstances. And now here comes WinFS with its own proprietary schema language. In recent years, it's been popular to layer innovation on top of base standards. So XSLT, XQuery, and SQL200n all rely on XPath, as WSDL relies on XSD. Yet no base standards beyond XML itself were of use to WinFS? It puzzles me. The things defined in WinFS don't seem exotic or mysterious. "A WinFS Contact type," the docs say, "has the Item super type. Person, Group, and Organization are some of its subtypes." If XSD can't model such things, we're in real trouble.

I doubt that anyone is claiming that W3C XML Schema cannot model containment or type derivations, however the way it does implements type derivation leaves much to be desired. In fact, this is the topic of an article I wrote that showed up on XML.com last week entitled XML Schema Design Patterns: Is Complex Type Derivation Unnecessary?. This is just another example of how things that seem straightforward end up being fairly complicated in W3C XML Schema. As Don Box puts it XML Schema has already eclipsed C++ in terms of complexity. Given that the WinFS schema language isn't even about modelling XML documents it seems perplexing that one would expect it to take on the complexity of using W3C XML Schema as its modelling language.

Of course WinFS does much more than model datatypes and structures. It's a highly sophisticated storage system that supports relational, object, and XML access styles, and that treats relationships among items as first-class objects in themselves (a potent feature I first encountered in the object database world years ago.) Great stuff! But the terminology of the Longhorn docs is revealing. Person, Contact, and Organization items are referred to as "Windows types," presumably because their schemata appear as classes in Longhorn's managed API. But to me these are universal types, not Windows types. I had expected them to be defined using XML Schema, and to be able to interoperate directly with SOAP payloads and XML documents on any platform.

Being defined using W3C XML Schema and being able to interoperate directly with XML documents on any platform are orthogonal. Information in relational databases like SQL Server is described using relational schema languages (i.e. SQL) however this hasn't stopped Microsoft from creating myriad ways to extract XML from SQL Server such as SQLXML, FOR XML queries and the .NET Framework's DataSet class which information stored in relational databases to interoperate directly with SOAP payloads and XML documents. No one would claim that the fact that the data in a relational database is not defined using W3C XML Schema (or RELAX NG) makes it impossible to extract XML from a relational database or view it as an XML data source. WinFS is no different.

It's troubling, though, that the architects must be consulted to find out whether Longhorn's "Windows types" will be transferable to standards-based software.

I don't work on WinFS so there was no chance I'd make a definitive statement about what features they plan to support or not. This is simply common sense on my part not an indication one way or the other about the degree of XML support in WinFS. With any luck I'll soon be able to get one of the WinFS folks to start blogging and then more accurate information can be gotten from the horses mouth.

Categories: Life in the B0rg Cube

November 4, 2003

@ 03:59 PM

Comments [1]

XPath Over Arbitrary Object Graphs: The Perils of Being Easily Distracted

I was planning to write this month's Extreme XML column on the recently released EXSLT.NET implementation produced by myself and a couple of others. One of the cool things about the EXSLT.NET project is that we added the ability to use the EXSLT extension functions in XPath queries over any data source that provides an XPathNavigator (i.e. implements IXPathNavigable). Thus one would be able to use functions like set:distinct and regexp:match when running XPath queries over objects that implement the IXPathNavigable interface such as the XPathDocument, XmlDocument or XmlDataDocument.

In constructing my examples I decided that it would be even cooler to show the extensibility of the .NET Framework if I showed how one could use the XPath extension functions in queries over implementations of XPathNavigator not provided by the .NET Framework such as my perennial favorite, the ObjectXPathNavigator.

After fixing some bugs in the ObjectXPathNavigator implementation on MSDN (MoveToParent() didn't take you to the root node from the document element and the navigator only exposed public properties but not public fields) I came across a problem which will probably turn into yet another project on GotDotNet workspaces. The heuristics the ObjectXPathNavigator uses to provide an XML view of an arbitrary object graph doesn't take into account the class annotations used by XML Serialization in the .NET Framework. Basically this means that if one reads in an XML document, converts it to objects using the XmlSerializer then creates an ObjectXPathNavigator over the objects...the XML view of the object provided by the ObjectXPathNavigator would not be the same as the XML generated when the class is serialized as XML via the XmlSerializer.

In fact for the ObjectXPathNavigator to provide the same XML view of an objects as the XmlSerializer would involve having it understand the various attributes for annotating classes from the System.Xml.Serialization namespace. Considering that in the future, the XPathNavigator should be the primary API for accessing XML in the .NET Framework it would be extremely quite useful if there was an API that allowed any object to be treated as a first class citizen of the XML world. The first step was the XmlSerializer which allowed any class to be saved and loaded to and from XML streams, the next step should be enabling any object to be accessed in the same way XML documents are as well. Instant benefits are things like the ability to perform XPath and XSLT over arbitrary objects. In the Whidbey/Yukon (Visual Studio v.next/SQL Server v.next) timeframe this means getting stuff like XQuery over objects or the ability to convert any object graph to an XmlReader for free.

It looks like I have a winter project, but first I have to finish this month's column on EXSLT.NET. *sigh*

Categories: XML

November 3, 2003

@ 04:04 AM

Comments [2]

A Waste of Two Hours

I just spent two hours trying to figure out why a chunk of code was throwing exceptions when compiled and run from my command line application but similar code worked fine when running in Visual Studio. The problem turned out to be a bug in the .NET Framework's XslTransform class which existed in v1.0 of the .NET Framework but not v1.1. Since I was using Visual Studio.NET 2003 [which uses v1.1 of the .NET Framework] to run my test app but compiling my actual application with the compiler for v1.0 of the .NET Framework I wasn't actually doing an apples-for-apples comparison when running both apps.

I'm tempted to uninstall v1.0 of the .NET Framework from my machine so I don't end up with facing this problem again. What a waste of time.

Categories: Ramblings

October 31, 2003

@ 10:06 PM

Comments [0]

On Jon Udell's Replace and Defend Theory

Jon Udell writes

Reading the Longhorn SDK docs is a disorienting experience. Everything's familiar but different.

Example 3: The new XSD:

"WinFS" Schema Definition Language "WinFS" introduces a schema definition language to describe "WinFS" types. This language is an XML vocabulary. "WinFS" includes a set of schemas that define a set of Item types and NestedElement types. These are called Windows types.

Yeah, "embrace and extend" was so much fun, I can hardly wait for "replace and defend." Seriously, if the suite of standards now targeted for elimination from Microsoft's actively-developed portfolio were a technological dead end, ripe for disruption, then we should all thank Microsoft for pulling the trigger. If, on the other hand, these standards are fundamentally sound, then it's a time for what Clayton Christensen calls sustaining rather than disruptive advances. I believe the ecosystem needs sustaining more than disruption. Like Joe, I hope Microsoft's bold move will mobilize the sustainers.

I can't speak for the other technologies Jon but being the Program manager responsible for XML schema technologies in the .NET Framework [as well as the fact that the offices of a number of WinFS folks are a few doors away from mine] I can speak on the above example.

The first thing I'd like to note is that Jon's example is that the W3C XML Schema definition language (XSD) is far from being targetted for elimination from Microsoft's actively-developed portfolio? We have almost a dozen technologies and products that utilize XSD in some way shape or form; SQL Server [Yukon], Visual Studio.NET, Indigo, Word, Excel, InfoPath, SQLXML, the .NET framework's DataSet, BizTalk, FoxPro, the .NET Framework's XmlSerializer and a couple of others. The fact that one technology or product decides that it makes more sense to create an XML vocabulary that meets their specific needs instead of shoehorning an inappropriate technology into their use case and thus causing pain for themselves and their customers is a wise decision that should be lauded (In fact,this is the entire point XML was invented in the first place, see my "SGML on the Web" post) instead of being heralded as another evil conspiracy by Microsoft.

Now to go into specifics. The WinFS schema language and W3C XML Schema do two fundamentally different things; W3C XML Schema is a way for describing the structure and contents of an XML document while a WinFS schema describes types and relationships of items stored in WinFS. At first glance, the only thing that connects both schema languages is that they are both written in XML. Of course, this is just syntax so this similarity isn't any more significant than the fact that both SQL and Java use English keywords.

Where it gets interesting is if one asks whether the WinFS data model can be mapped to XML and then XSD used to describe the structure of this XML view of WinFS. This is possible but leads to impedance mismatches due to the differences between the WinFS model and that used by W3C XML Schema. Specifically there are constructs in W3C XML Schema that don't map to concepts in WinFS and concepts in WinFS that don't map to constructs in W3C XML Schema. So for WinFS to use W3C XML Schema as the syntax for its schema language it would have to do two things

Support a subset of W3C XML Schema
Extend W3C XML Schema to add support for WinFS concepts

The problem with this approach is that it leads to complaints of a different kind. The first being that there is user confusion because not every valid W3C XML Schema construct is usable in this context and the other being that W3C XML Schema will end up be extended in a proprietary manner which eventually leads to yells of "embrace and extend". By the way, this isn't guesswork on my part this is a description of what happened when Microsoft took this approach with the .NET Framework's DataSet class and SQLXML. History has taught us that these approaches were unwise and used W3C XML Schema outside of its original goals and usage scenarios. Instead, we have moved back to the original goals of XML where instead of relying on one uber-vocabulary for all ones needs, vocabularies specific to various situations are built but can be translated or transformed as needed when moving from one domain to another.

Ideally, even though WinFS has its own schema language it makes sense that it should be able to import or export WinFS items as XML described using an W3C XML Schema since this is the most popular way to transfer structured and semi-structured data in our highly connected world. This is functionality is something I've brought up with the WinFS architects which they have stated will be investigated.

Categories: Life in the B0rg Cube

October 30, 2003

@ 03:01 AM

Comments [0]

What's New For XML Programming Models in the Next Version of the .NET Framework

Drew Marsh blogged about the talk given by my boss at this year's Microsoft Professional Developer's Conference (PDC) entitled What's New In System.Xml For Whidbey?". Since I'm directly responsible for some of the stuff mentioned in the talk I though it would make sense if I made some clarifications or added details where some where lacking from his coverage.

Usability Improvements (Beta 1)

CLR type accesors on XmlReader, XmlWriter and XPathNavigator: Double unitPrice = reader.ValueAsDouble

This was a big gripe from folks in v1.0 of the .NET Framework that they couldn't access the XML in a validated document as a typed value, this is no longer the case in Whidbey. However people who want this functionality will have to move to the XPathDocument instead of the XmlDocument. People will be able to get typed values from an XmlDocument (actually from anything that implements IXPathNavigable) but actually storing the data in the in-memory representation as a typed value will only be available on the XPathDocument.

XPathDocument A Better XML DOM

"XmlDocument is dead."

XPathDocument replaces the XmlDocument as the primary XML store.
Feature Set

20%-40% more performant for XSLT and Xquery
Editing capabilities through the XPathEditor (derives from XPathDocument) using an XmlWriter (the mythical XmlNodeWriter we've all been searching for).
XML schema validation
Strongly typed store. Integers stored as int internally (per schema) (Beta 1)
Change tracking at node level
UI databinding support to WinForms and ASP.NET controls (Beta 1)

Yup, in v1.0 of the .NET Framework we moved away from a push-based parser (SAX) in MSXML to a pull-based parser (XmlReader) in the .NET Framework. In v2.0 of the .NET Framework there's been a similar shift, from the DOM data model & tree based APIs for accessing XML to the XPath data model & cursor based APIs for accessing XML. If you are curious about some of the thinking that went into this decision you should take a look at my article in XML Journal entitled Can One Size Fit All?

Note: XPathDocument2 in PDC bits will be XPathDocument once again by Beta 1. "We were at an unfortunate design stage at the point where the PDC bits were created."

Yeah, things were in flux for a while during our development process. The features of the class called XPathDocument2 in the PDC builds will be integrated back into the XPathDocument class that was in v1.0 of the .NET Framework.

The rest of the stuff in the talk (XQuery, new XML editor in Visual Studio.NET, ADO.NET with SQLXML, etc) isn't stuff I'm directly responsible for so I hesitate to comment further, however Drew has taken excellent notes about them so it is clear which direction we're going in for Whidbey.

Categories: Life in the B0rg Cube | XML

October 30, 2003

@ 01:47 AM

Comments [0]

XML Schema Design Patterns (part 3)

The third in my semi-regular series of guidelines for working with W3C XML Schema for XML.com is now up. The article is entitled XML Schema Design Patterns: Is Complex Type Derivation Unnecessary? and the article is excerpted below for those who may not have the time to read the entire article

INTRODUCTION

W3C XML Schema (WXS) possesses a number of features that mimic object oriented concepts, including type derivation and polymorphism. However real world experience has shown that these features tend to complicate schemas, may have subtle interactions that lead tricky problems, and can often be replaced by other features of WXS. In this article I explore both derivation by restriction and derivation by extension of complex types showing the pros and cons of both techniques, as well as showing alternatives to achieving the same results

MIDDLE

As usage of XML and XML schema languages has become more widespread, two primary usage scenarios have developed around XML document validation and XML schemas.

Describing and enforcing the contract between producers and consumers of XML documents: ...
Creating the basis for processing and storing typed data represented as XML documents: ...

CONCLUSION

Based on the current technological landscape the complex type derivation features of WXS may add more problems than they solve in the two most commmon schema use cases. For validation scenarios, derivation by restriction is of marginal value, while derivation by extension is a good way to create modularity as well as encourage reuse. Care must however be taken to consider the ramifications of the various type substitutability features of WXS (xsi:type and substitution groups) when using derivation by extension in scenarios revolving around document validation.

Currently processing and storage of strongly typed XML data is primarily the province of conventional OOP languages and relational databases respectively. This means that certain features of WXS such as derivation by restriction (and to a lesser extent derivation by extension) cause an impedance mismatch between the type system used to describe strongly typed XML and the mechanisms used for processing and storing said XML. Eventually when technologies like XQuery become widespread for processing typed XML and support for XML and W3C XML Schema is integrated into mainstream database products this impedance mismatch will not be important. Until then complex type derivation should be carefully evaluated before being used in situations where W3C XML Schema is primarily being used as a mechanism to create type annotated XML infosets.

Categories: XML

Dare Obasanjo's weblog

"You can buy cars but you can't buy respect in the hood" - Curtis Jackson

Navigation for Friday, 07 November 2003 - Dare Obasanjo's weblog