Tuesday, 04 November 2003 - Dare Obasanjo's weblog

November 4, 2003

@ 03:59 PM

XPath Over Arbitrary Object Graphs: The Perils of Being Easily Distracted

I was planning to write this month's Extreme XML column on the recently released EXSLT.NET implementation produced by myself and a couple of others. One of the cool things about the EXSLT.NET project is that we added the ability to use the EXSLT extension functions in XPath queries over any data source that provides an XPathNavigator (i.e. implements IXPathNavigable). Thus one would be able to use functions like set:distinct and regexp:match when running XPath queries over objects that implement the IXPathNavigable interface such as the XPathDocument, XmlDocument or XmlDataDocument.

In constructing my examples I decided that it would be even cooler to show the extensibility of the .NET Framework if I showed how one could use the XPath extension functions in queries over implementations of XPathNavigator not provided by the .NET Framework such as my perennial favorite, the ObjectXPathNavigator.

After fixing some bugs in the ObjectXPathNavigator implementation on MSDN (MoveToParent() didn't take you to the root node from the document element and the navigator only exposed public properties but not public fields) I came across a problem which will probably turn into yet another project on GotDotNet workspaces. The heuristics the ObjectXPathNavigator uses to provide an XML view of an arbitrary object graph doesn't take into account the class annotations used by XML Serialization in the .NET Framework. Basically this means that if one reads in an XML document, converts it to objects using the XmlSerializer then creates an ObjectXPathNavigator over the objects...the XML view of the object provided by the ObjectXPathNavigator would not be the same as the XML generated when the class is serialized as XML via the XmlSerializer.

In fact for the ObjectXPathNavigator to provide the same XML view of an objects as the XmlSerializer would involve having it understand the various attributes for annotating classes from the System.Xml.Serialization namespace. Considering that in the future, the XPathNavigator should be the primary API for accessing XML in the .NET Framework it would be extremely quite useful if there was an API that allowed any object to be treated as a first class citizen of the XML world. The first step was the XmlSerializer which allowed any class to be saved and loaded to and from XML streams, the next step should be enabling any object to be accessed in the same way XML documents are as well. Instant benefits are things like the ability to perform XPath and XSLT over arbitrary objects. In the Whidbey/Yukon (Visual Studio v.next/SQL Server v.next) timeframe this means getting stuff like XQuery over objects or the ability to convert any object graph to an XmlReader for free.

It looks like I have a winter project, but first I have to finish this month's column on EXSLT.NET. *sigh*

Categories: XML

November 3, 2003

@ 04:04 AM

Comments [2]

A Waste of Two Hours

I just spent two hours trying to figure out why a chunk of code was throwing exceptions when compiled and run from my command line application but similar code worked fine when running in Visual Studio. The problem turned out to be a bug in the .NET Framework's XslTransform class which existed in v1.0 of the .NET Framework but not v1.1. Since I was using Visual Studio.NET 2003 [which uses v1.1 of the .NET Framework] to run my test app but compiling my actual application with the compiler for v1.0 of the .NET Framework I wasn't actually doing an apples-for-apples comparison when running both apps.

I'm tempted to uninstall v1.0 of the .NET Framework from my machine so I don't end up with facing this problem again. What a waste of time.

Categories: Ramblings

October 31, 2003

@ 10:06 PM

Comments [0]

On Jon Udell's Replace and Defend Theory

Jon Udell writes

Reading the Longhorn SDK docs is a disorienting experience. Everything's familiar but different.

Example 3: The new XSD:

"WinFS" Schema Definition Language "WinFS" introduces a schema definition language to describe "WinFS" types. This language is an XML vocabulary. "WinFS" includes a set of schemas that define a set of Item types and NestedElement types. These are called Windows types.

Yeah, "embrace and extend" was so much fun, I can hardly wait for "replace and defend." Seriously, if the suite of standards now targeted for elimination from Microsoft's actively-developed portfolio were a technological dead end, ripe for disruption, then we should all thank Microsoft for pulling the trigger. If, on the other hand, these standards are fundamentally sound, then it's a time for what Clayton Christensen calls sustaining rather than disruptive advances. I believe the ecosystem needs sustaining more than disruption. Like Joe, I hope Microsoft's bold move will mobilize the sustainers.

I can't speak for the other technologies Jon but being the Program manager responsible for XML schema technologies in the .NET Framework [as well as the fact that the offices of a number of WinFS folks are a few doors away from mine] I can speak on the above example.

The first thing I'd like to note is that Jon's example is that the W3C XML Schema definition language (XSD) is far from being targetted for elimination from Microsoft's actively-developed portfolio? We have almost a dozen technologies and products that utilize XSD in some way shape or form; SQL Server [Yukon], Visual Studio.NET, Indigo, Word, Excel, InfoPath, SQLXML, the .NET framework's DataSet, BizTalk, FoxPro, the .NET Framework's XmlSerializer and a couple of others. The fact that one technology or product decides that it makes more sense to create an XML vocabulary that meets their specific needs instead of shoehorning an inappropriate technology into their use case and thus causing pain for themselves and their customers is a wise decision that should be lauded (In fact,this is the entire point XML was invented in the first place, see my "SGML on the Web" post) instead of being heralded as another evil conspiracy by Microsoft.

Now to go into specifics. The WinFS schema language and W3C XML Schema do two fundamentally different things; W3C XML Schema is a way for describing the structure and contents of an XML document while a WinFS schema describes types and relationships of items stored in WinFS. At first glance, the only thing that connects both schema languages is that they are both written in XML. Of course, this is just syntax so this similarity isn't any more significant than the fact that both SQL and Java use English keywords.

Where it gets interesting is if one asks whether the WinFS data model can be mapped to XML and then XSD used to describe the structure of this XML view of WinFS. This is possible but leads to impedance mismatches due to the differences between the WinFS model and that used by W3C XML Schema. Specifically there are constructs in W3C XML Schema that don't map to concepts in WinFS and concepts in WinFS that don't map to constructs in W3C XML Schema. So for WinFS to use W3C XML Schema as the syntax for its schema language it would have to do two things

Support a subset of W3C XML Schema
Extend W3C XML Schema to add support for WinFS concepts

The problem with this approach is that it leads to complaints of a different kind. The first being that there is user confusion because not every valid W3C XML Schema construct is usable in this context and the other being that W3C XML Schema will end up be extended in a proprietary manner which eventually leads to yells of "embrace and extend". By the way, this isn't guesswork on my part this is a description of what happened when Microsoft took this approach with the .NET Framework's DataSet class and SQLXML. History has taught us that these approaches were unwise and used W3C XML Schema outside of its original goals and usage scenarios. Instead, we have moved back to the original goals of XML where instead of relying on one uber-vocabulary for all ones needs, vocabularies specific to various situations are built but can be translated or transformed as needed when moving from one domain to another.

Ideally, even though WinFS has its own schema language it makes sense that it should be able to import or export WinFS items as XML described using an W3C XML Schema since this is the most popular way to transfer structured and semi-structured data in our highly connected world. This is functionality is something I've brought up with the WinFS architects which they have stated will be investigated.

Categories: Life in the B0rg Cube

October 30, 2003

@ 03:01 AM

Comments [0]

What's New For XML Programming Models in the Next Version of the .NET Framework

Drew Marsh blogged about the talk given by my boss at this year's Microsoft Professional Developer's Conference (PDC) entitled What's New In System.Xml For Whidbey?". Since I'm directly responsible for some of the stuff mentioned in the talk I though it would make sense if I made some clarifications or added details where some where lacking from his coverage.

Usability Improvements (Beta 1)

CLR type accesors on XmlReader, XmlWriter and XPathNavigator: Double unitPrice = reader.ValueAsDouble

This was a big gripe from folks in v1.0 of the .NET Framework that they couldn't access the XML in a validated document as a typed value, this is no longer the case in Whidbey. However people who want this functionality will have to move to the XPathDocument instead of the XmlDocument. People will be able to get typed values from an XmlDocument (actually from anything that implements IXPathNavigable) but actually storing the data in the in-memory representation as a typed value will only be available on the XPathDocument.

XPathDocument A Better XML DOM

"XmlDocument is dead."

XPathDocument replaces the XmlDocument as the primary XML store.
Feature Set

20%-40% more performant for XSLT and Xquery
Editing capabilities through the XPathEditor (derives from XPathDocument) using an XmlWriter (the mythical XmlNodeWriter we've all been searching for).
XML schema validation
Strongly typed store. Integers stored as int internally (per schema) (Beta 1)
Change tracking at node level
UI databinding support to WinForms and ASP.NET controls (Beta 1)

Yup, in v1.0 of the .NET Framework we moved away from a push-based parser (SAX) in MSXML to a pull-based parser (XmlReader) in the .NET Framework. In v2.0 of the .NET Framework there's been a similar shift, from the DOM data model & tree based APIs for accessing XML to the XPath data model & cursor based APIs for accessing XML. If you are curious about some of the thinking that went into this decision you should take a look at my article in XML Journal entitled Can One Size Fit All?

Note: XPathDocument2 in PDC bits will be XPathDocument once again by Beta 1. "We were at an unfortunate design stage at the point where the PDC bits were created."

Yeah, things were in flux for a while during our development process. The features of the class called XPathDocument2 in the PDC builds will be integrated back into the XPathDocument class that was in v1.0 of the .NET Framework.

The rest of the stuff in the talk (XQuery, new XML editor in Visual Studio.NET, ADO.NET with SQLXML, etc) isn't stuff I'm directly responsible for so I hesitate to comment further, however Drew has taken excellent notes about them so it is clear which direction we're going in for Whidbey.

Categories: Life in the B0rg Cube | XML

October 30, 2003

@ 01:47 AM

Comments [0]

XML Schema Design Patterns (part 3)

The third in my semi-regular series of guidelines for working with W3C XML Schema for XML.com is now up. The article is entitled XML Schema Design Patterns: Is Complex Type Derivation Unnecessary? and the article is excerpted below for those who may not have the time to read the entire article

INTRODUCTION

W3C XML Schema (WXS) possesses a number of features that mimic object oriented concepts, including type derivation and polymorphism. However real world experience has shown that these features tend to complicate schemas, may have subtle interactions that lead tricky problems, and can often be replaced by other features of WXS. In this article I explore both derivation by restriction and derivation by extension of complex types showing the pros and cons of both techniques, as well as showing alternatives to achieving the same results

MIDDLE

As usage of XML and XML schema languages has become more widespread, two primary usage scenarios have developed around XML document validation and XML schemas.

Describing and enforcing the contract between producers and consumers of XML documents: ...
Creating the basis for processing and storing typed data represented as XML documents: ...

CONCLUSION

Based on the current technological landscape the complex type derivation features of WXS may add more problems than they solve in the two most commmon schema use cases. For validation scenarios, derivation by restriction is of marginal value, while derivation by extension is a good way to create modularity as well as encourage reuse. Care must however be taken to consider the ramifications of the various type substitutability features of WXS (xsi:type and substitution groups) when using derivation by extension in scenarios revolving around document validation.

Currently processing and storage of strongly typed XML data is primarily the province of conventional OOP languages and relational databases respectively. This means that certain features of WXS such as derivation by restriction (and to a lesser extent derivation by extension) cause an impedance mismatch between the type system used to describe strongly typed XML and the mechanisms used for processing and storing said XML. Eventually when technologies like XQuery become widespread for processing typed XML and support for XML and W3C XML Schema is integrated into mainstream database products this impedance mismatch will not be important. Until then complex type derivation should be carefully evaluated before being used in situations where W3C XML Schema is primarily being used as a mechanism to create type annotated XML infosets.

Categories: XML

October 29, 2003

@ 01:57 PM

Comments [2]

On the Death of the XML Database

A recent article by Phil Howard of Bloor Research on IT-Director.com talks about the Demise of the XML Database. Excerpts below

While you can still buy an XML database purely because it provides faster storage capability and greater functionality than a conventional database, all the erstwhile XML database vendors are increasingly turning to other sources of use for their products.

These other markets basically consist of two different sectors: the use of XML databases as a part of an integration strategy, where the database is used to provide on-the-fly translation for XML documents, and for content management...

The reason why there is this trend away from pure XML storage is because advanced XML capabilities are being introduced by all the leading relational vendors.

This has been considered "fighting words" from some in the XML database camp such as Mike Champion (works on Tamino XML database) and Kimbro Staken (one of the originators of Apache Xindice). Mike Champion comes up with a number of counter-arguments to the claims in the article I found interesting and felt compelled to comment on. According to Mike

It is widely believed that less than a quarter of enterprise data is currently stored in RDBMS systems. This suggests that the market is not "making do" with what the relational database products offer today, but using a wide variety of technologies.

This is actually the mantra of the team I work for at Microsoft. We are responsible for data access technologies (Relational, Object and XML) and our GM is fond of trotting out the quote about "less than a quarter of enterprise data is currently stored in a relational database". A lot of data important to businesses is just siting around on file systems in various Microsoft Office documents and other file formats. The bet across the software industry is that moving all this semi-structured business documents to XML is the right way to go and the first step has been achieved given that modern business productivity software (including the Open Source ones) are moving to fully supporting XML for their document formats. Step one is definitely to get all those memos, contracts and spreadsheets into XML.

The main reason OODBMS didn't hit the sweet spot, AFAIK, is that they created a tight coupling between application code and the DBMS. Potential performance gains this allows can outweigh the maintenance challenges in extremely business critical, high transaction volume environments...XML DBMS, on the other hand, inherit XML's suitability for loosely coupling systems, applications, and tools across a wide range of environments.

Totally agree here about the weakness of OODBMSs in creating a tight coupling between applications and the data they accessed. For a more in-depth description of the disadvantages of object oriented databases in comparison to their relational counterparts you can read my article An Exploration of Object Oriented Database Management Systems.

Again AFAIK (having only played with OODBMS personally), there is relatively little portability across OODBMS systems; code written for one would be very expensive to adapt to another. Investing in the technology required one to make a risky bet on the vendor who supplied it. This created an environment where the object-relational vendors could prosper by offering only a subset of the features but the absolute assurance that they would be in business for years to come. In the XML DBMS world, on the other hand, all support roughly the same schema, query language, and API standards;

There are two points Mike is making here

There is very little portability across OODBMS systems.
In the XML DBMS world, on the other hand, all support roughly the same schema, query language, and API standards

Based on my experiences with OODBMSs the first claim is entirely accurate, moving data from one OODBMS system was a pain and there was a definitle lack of standardization of APIs and query languages across various products. The second claim is rather suspect to me. I am unaware of any schema, query or API standards that are supported uniformly across XML database products. This isn't to say there aren't standardized W3C branded XML schema languages or query languages nor that there haven't been moves to come up with standard XML database APIs but when last I looked these weren't uniformly supported across many the XML database products and where they were there was a distinct lack of maturity in their offerings. Granted it's been almost a year since I last looked.

However there is an obvious point about portability that Mike doesn't mention (perhaps because it is so obvious). The entire point of XML is being portable and interoperability, moving data from one XML database to another should simply be a case of "export database as XML" from one and "import XML into database" on the other.

The standards of the XML world provide a clearly defined and fairly high bar for those who would seek to take away the market pioneered by the XML DBMS vendors. For better or worse, the XML family of specs is complex and quite challenging to support efficiently in a DBMS system. It's one thing to support, as the RDBMS vendors now do quite well, XML views of structured, typed, relatively "flat" data such as are typically found in RDBMS applications. It is quite another to efficiently and scalably support queries and updates on "document-like" XML with relatively open content models, lots of recursion, mixed content, and where wildcard text comparisions are more frequent than typed value comparisons. The dominant DBMS vendors obviously have talent and money to throw at the problem, but analysts should not assume that they will surpass theese capabilities of the XML DBMS systems anytime soon

OK, this one sounds like FUD. Basically Mike seems to be saying the family of XML specs is so complex (thanks to the W3C, but that's another story) that companies like Oracle, IBM and Microsoft won't be able to come up with ways to query semi-structured data efficiently or perform text comparison searches well so you are best of sticking to a seperate database for your XML data instead of having all your data stored in a single unified store.

So what is my position on the death of native XML databases? Like Phil Howard, I suspect that once XML support becomes [further] integrated into mainstream relational databases (which it already has to some degree) then native XML databases will be hard pressed to come up with reasons why one would want to buy a separate product for storing XML data distinct from the rest of the data for a business when a traditional relational database can store it all. It's all about integration. Businesses prefer buying a single office productivity suite than mixing and matching word processors, spreadsheets and presentation programs from different vendors. I suspect the same is true when it comes to their data storage needs.

Categories: XML

October 29, 2003

@ 12:49 PM

Comments [0]

Making Love To Vaccuum Cleaners

Beware the Hoover Dustette

Categories: Ramblings

October 29, 2003

@ 04:33 AM

Comments [16]

RSS Bandit 1.2.0.43 Released

Get it here

Differences between v1.2.0.43 and v1.2.0.42 below

This primarily fixes complaints by Roy Osherove about the responsiveness of RSS Bandit. Eliminated a number of places where GUI locks up when performing tasks by using background threads. Also reduced number of threads used by thread pool when downloading feeds.
Added missing source files to installer package, enabling one to compile RSS Bandit from source if so desired.
Reverted to behavior where links are opened in a new tabbed browser pane as opposed to the feed display pane
Fixed: ArgumentNullException when attempt made to "Refresh Feed" when the top most 'My Feeds' node is selected in the tree view.

Categories: RSS Bandit

October 28, 2003

@ 08:16 AM

Comments [3]

Weblogging Birds-Of-A-Feather At PDC

Randy Holloway writes

This session was set up as an open conversation, with only one concrete agenda item. That being RSS versus Atom.

Interesting, a bunch of developers get together to discuss weblogging technologies and they discuss the most irrelevant piece of the puzzle. For those not keeping track, there are two primary weblog syndication formats in popular usage; the RDF-based RSS 1.0 and Dave Winer's RSS 0.91/RSS 2.0. Developers tend to prefer Dave Winer's specs to the RSS 1.0 branch but due to various interpersonal issues with Dave Winer (unsurprising since he can be quite trying) a bunch of people decided to create a third syndication format (Atom) which duplicates the functionality of the other two primarily to get around the fact that Dave Winer controlled the spec for the most popular feed syndication format. This third format adds little to the table besides fragmenting the feed syndication world which already has to deal with RSS 1.0 vs. RSS 0.91/RSS 2.0 issues. In fact, this redundancy is currently being debated on the atom-syntax list.

There are three major technologies (i.e. XML formats) in the blogging world; feed syndication (RSS), blog editing (Blogger API & MetaWeblog API) and feed list information (OPML). Dave Winer's specs are dominant in all three areas given that he is the author of both the MetaWeblog API and OPML spec. Of the three of the them, RSS is probably the best of the specs and meets the needs of most users except for the Semantic Web folks who want an RDF-based format (i.e. RSS 1.0). On the other hand, there are significant deficiencies in both the MetaWeblog API and OPML. I have blogged about What is wrong with the MetaWeblog API as well as mentioned some of the problems with OPML as a format for storing information about subscribed feeds.

Given that RSS is the best of the weblog related technologies while the blog editing and feed list formats are actually the technologies with problems one might wonder why there is so much energy invested in fixing what isn't that broken instead of trying to tackle actual problems that affect developers and users of blogging tools? One of the answers to this question comes from a comment that Randy Holloway says someone made during the Weblogging BOF

"We don't solve problems, we just talk about them."

Most of the people engaged in the discussions don't actually write any code or at least not any weblogging related code, so they are unaware of the real problems but instead focus on simple yet irrelevant issues that are easy to grok. This is definitely a case of bike shedding. [Hmmm, I love the term "bike shedding" so much I dug up the original source of the phrase]

Speaking of Atom, I'm curious as to how all the XML Web Services folks at the Weblogging BOF felt about the fact that the current drafts of the ATOM API uses just HTTP and XML instead of the XML Web Services buzzword soup (SOAP, WSDL, etc) meaning they won't be using Indigo to code against it just plain old System.Xml and System.Net. If ""XML-RPC is a fantastic solution... from a while ago" I wonder what they think of using just HTTP with no fancy object<->XML mappings, positively prehistoric :)

Categories: Ramblings

October 28, 2003

@ 06:58 AM

Comments [2]

What Exactly Can RSS Bandit Do?

I finally got around to providing a list of the major features of RSS Bandit on the RSS Bandit wiki this evening. I finally was motivated to do this after reading a blog post by Luke Huttemann, the author of SharpReader and the ensuing comments.Of the ten comments in response to his post about an upcoming release of SharpReader, four of them were requests for features already in RSS Bandit (actually two of them actually mentioned RSS Bandit by name). I find it amusing that the requestors would rather wait for Luke to add the features to SharpReader than use RSS Bandit. That's definitely some brand loyalty at work. :)

Anyway, in the spirit of democracy there's another vote going on in the RSS Bandit workspace. This time it's to decide what features should be in the next release of RSS Bandit. It seems like structured search (mentioned in a previous blog entry) is high on everyone's priority list and will definitely make it into the next release. Actually the release after the next one since I plan to ship a bugfix release in the next couple of days to clean up some of the sloppiness in the last release since it was rushed to be in time for PDC.

One of the things that warms the cockles of my heart is seeing RSS Bandit mentioned in posts such as The Magic of Blogging. Having some of the stuff you work on mentioned in the same breath as the word "magic" is pretty fucking cool. It's great to see regular people actually getting some use out software I helped build instead of just building plumbing for developers which is my day job.

Categories: RSS Bandit

Dare Obasanjo's weblog

"You can buy cars but you can't buy respect in the hood" - Curtis Jackson

Navigation for Tuesday, 04 November 2003 - Dare Obasanjo's weblog