Jon Udell writes

Reading the Longhorn SDK docs is a disorienting experience. Everything's familiar but different.

Example 3: The new XSD:

"WinFS" Schema Definition Language "WinFS" introduces a schema definition language to describe "WinFS" types. This language is an XML vocabulary. "WinFS" includes a set of schemas that define a set of Item types and NestedElement types. These are called Windows types.

Yeah, "embrace and extend" was so much fun, I can hardly wait for "replace and defend." Seriously, if the suite of standards now targeted for elimination from Microsoft's actively-developed portfolio were a technological dead end, ripe for disruption, then we should all thank Microsoft for pulling the trigger. If, on the other hand, these standards are fundamentally sound, then it's a time for what Clayton Christensen calls sustaining rather than disruptive advances. I believe the ecosystem needs sustaining more than disruption. Like Joe, I hope Microsoft's bold move will mobilize the sustainers.

I can't speak for the other technologies Jon but being the Program manager responsible for XML schema technologies in the .NET Framework [as well as the fact that the offices of a number of WinFS folks are a few doors away from mine] I can speak on the above example.

The first thing I'd like to note is that Jon's example is that the W3C XML Schema definition language (XSD) is far from being targetted for elimination from Microsoft's actively-developed portfolio? We have almost a dozen technologies and products that utilize XSD in some way shape or form; SQL Server [Yukon], Visual Studio.NET, Indigo, Word, Excel, InfoPath, SQLXML, the .NET framework's DataSet, BizTalk, FoxPro, the .NET Framework's XmlSerializer and a couple of others. The fact that one technology or product decides that it makes more sense to create an XML vocabulary that meets their specific needs instead of shoehorning an inappropriate technology into their use case and thus causing pain for themselves and their customers is a wise decision that should be lauded (In fact,this is the entire point XML was invented in the first place, see my "SGML on the Web" post) instead of being heralded as another evil conspiracy by Microsoft.

Now to go into specifics. The WinFS schema language and W3C XML Schema do two fundamentally different things; W3C XML Schema is a way for describing the structure and contents of an XML document while a WinFS schema describes types and relationships of items stored in WinFS. At first glance, the only thing that connects both schema languages is that they are both written in XML. Of course, this is just syntax so this similarity isn't any more significant than the fact that both SQL and Java use English keywords.

Where it gets interesting is if one asks whether the WinFS data model can be mapped to XML and then XSD used to describe the structure of this XML view of WinFS. This is possible but leads to impedance mismatches due to the differences between the WinFS model and that used by W3C XML Schema. Specifically there are constructs in W3C XML Schema that don't map to concepts in WinFS and concepts in WinFS that don't map to constructs in W3C XML Schema. So for WinFS to use W3C XML Schema as the syntax for its schema language it would have to do two things

  1. Support a subset of W3C XML Schema
  2. Extend W3C XML Schema to add support for WinFS concepts

The problem with this approach is that it leads to complaints of a different kind. The first being that there is user confusion because not every valid W3C XML Schema construct is usable in this context and the other being that W3C XML Schema will end up be extended in a proprietary manner which eventually leads to yells of "embrace and extend". By the way, this isn't guesswork on my part this is a description of what happened when Microsoft took this approach with the .NET Framework's DataSet class and SQLXML. History has taught us that these approaches were unwise and used W3C XML Schema outside of its original goals and usage scenarios. Instead, we have moved back to the original goals of XML where instead of relying on one uber-vocabulary for all ones needs, vocabularies specific to various situations are built but can be translated or transformed as needed when moving from one domain to another.

Ideally, even though WinFS has its own schema language it makes sense that it should be able to import or export WinFS items as XML described using an W3C XML Schema since this is the most popular way to transfer structured and semi-structured data in our highly connected world. This is functionality is something I've brought up with the WinFS architects which they have stated will be investigated.


 

Categories: Life in the B0rg Cube

Drew Marsh blogged about the talk given by my boss at this year's Microsoft Professional Developer's Conference (PDC) entitled What's New In System.Xml For Whidbey?". Since I'm directly responsible for some of the stuff mentioned in the talk I though it would make sense if I made some clarifications or added details where some where lacking from his coverage.

Usability Improvements (Beta 1)

  • CLR type accesors on XmlReader, XmlWriter and XPathNavigator: Double unitPrice = reader.ValueAsDouble

 

This was a big gripe from folks in v1.0 of the .NET Framework that they couldn't access the XML in a validated document as a typed value, this is no longer the case in Whidbey. However people who want this functionality will have to move to the XPathDocument instead of the XmlDocument. People will be able to get typed values from an XmlDocument (actually from anything that implements IXPathNavigable) but actually storing the data in the in-memory representation as a typed value will only be available on the XPathDocument.  

 

XPathDocument A Better XML DOM

"XmlDocument is dead."

 

  • XPathDocument replaces the XmlDocument as the primary XML store.
  • Feature Set
    • 20%-40% more performant for XSLT and Xquery
    • Editing capabilities through the XPathEditor (derives from XPathDocument) using an XmlWriter (the mythical XmlNodeWriter we've all been searching for).
    • XML schema validation
    • Strongly typed store. Integers stored as int internally (per schema) (Beta 1)
    • Change tracking at node level
    • UI databinding support to WinForms and ASP.NET controls (Beta 1)

Yup, in v1.0 of the .NET Framework we moved away from a push-based parser (SAX) in MSXML to a pull-based parser (XmlReader) in the .NET Framework. In v2.0 of the .NET Framework there's been a similar shift, from the DOM data model & tree based APIs for accessing XML to the XPath data model & cursor based APIs for accessing XML. If you are curious about some of the thinking that went into this decision you should take a look at my article in XML Journal entitled Can One Size Fit All? 

 

Note: XPathDocument2 in PDC bits will be XPathDocument once again by Beta 1. "We were at an unfortunate design stage at the point where the PDC bits were created."

Yeah, things were in flux for a while during our development process. The features of the class called XPathDocument2 in the PDC builds will be integrated back into the XPathDocument class that was in v1.0 of the .NET Framework.

 

The rest of the stuff in the talk (XQuery, new XML editor in Visual Studio.NET, ADO.NET with SQLXML, etc) isn't stuff I'm directly responsible for so I hesitate to comment further, however Drew has taken excellent notes about them so it is clear which direction we're going in for Whidbey.

 

 

 


 

Categories: Life in the B0rg Cube | XML

The third in my semi-regular series of guidelines for working with W3C XML Schema for XML.com is now up. The article is entitled XML Schema Design Patterns: Is Complex Type Derivation Unnecessary? and the article is excerpted below for those who may not have the time to read the entire article

INTRODUCTION

W3C XML Schema (WXS) possesses a number of features that mimic object oriented concepts, including type derivation and polymorphism. However real world experience has shown that these features tend to complicate schemas, may have subtle interactions that lead tricky problems, and can often be replaced by other features of WXS. In this article I explore both derivation by restriction and derivation by extension of complex types showing the pros and cons of both techniques, as well as showing alternatives to achieving the same results

MIDDLE

As usage of XML and XML schema languages has become more widespread, two primary usage scenarios have developed around XML document validation and XML schemas.

  1. Describing and enforcing the contract between producers and consumers of XML documents: ...
  2. Creating the basis for processing and storing typed data represented as XML documents: ...

CONCLUSION

Based on the current technological landscape the complex type derivation features of WXS may add more problems than they solve in the two most commmon schema use cases. For validation scenarios, derivation by restriction is of marginal value, while derivation by extension is a good way to create modularity as well as encourage reuse. Care must however be taken to consider the ramifications of the various type substitutability features of WXS (xsi:type and substitution groups) when using derivation by extension in scenarios revolving around document validation.

Currently processing and storage of strongly typed XML data is primarily the province of conventional OOP languages and relational databases respectively. This means that certain features of WXS such as derivation by restriction (and to a lesser extent derivation by extension) cause an impedance mismatch between the type system used to describe strongly typed XML and the mechanisms used for processing and storing said XML. Eventually when technologies like XQuery become widespread for processing typed XML and support for XML and W3C XML Schema is integrated into mainstream database products this impedance mismatch will not be important. Until then complex type derivation should be carefully evaluated before being used in situations where W3C XML Schema is primarily being used as a mechanism to create type annotated XML infosets.


 

Categories: XML

October 29, 2003
@ 01:57 PM

A recent article by Phil Howard of Bloor Research on IT-Director.com talks about the Demise of the XML Database. Excerpts below

While you can still buy an XML database purely because it provides faster storage capability and greater functionality than a conventional database, all the erstwhile XML database vendors are increasingly turning to other sources of use for their products.

These other markets basically consist of two different sectors: the use of XML databases as a part of an integration strategy, where the database is used to provide on-the-fly translation for XML documents, and for content management...

The reason why there is this trend away from pure XML storage is because advanced XML capabilities are being introduced by all the leading relational vendors. 

This has been considered "fighting words" from some in the XML database camp such as Mike Champion (works on Tamino XML database) and Kimbro Staken (one of the originators of Apache Xindice). Mike Champion comes up with a number of counter-arguments to the claims in the article I found interesting and felt compelled to comment on. According to Mike

  • It is widely believed that less than a quarter of enterprise data is currently stored in RDBMS systems. This suggests that the market is not "making do" with what the relational database products offer today, but using a wide variety of technologies.

This is actually the mantra of the team I work for at Microsoft. We are responsible for data access technologies (Relational, Object and XML) and our GM is fond of trotting out the quote about "less than a quarter of enterprise data is currently stored in a relational database". A lot of data important to businesses is just siting around on file systems in various Microsoft Office documents and other file formats. The bet across the software industry is that moving all this semi-structured business documents to XML is the right way to go and the first step has been achieved given that modern business productivity software (including the Open Source ones) are moving to fully supporting XML for their document formats. Step one is definitely to get all those memos, contracts and spreadsheets into XML.

  • The main reason OODBMS didn't hit the sweet spot, AFAIK, is that they created a tight coupling between application code and the DBMS. Potential performance gains this allows can outweigh the maintenance challenges in extremely business critical, high transaction volume environments...XML DBMS, on the other hand, inherit XML's suitability for loosely coupling systems, applications, and tools across a wide range of environments.

Totally agree here about the weakness of OODBMSs in creating a tight coupling between applications and the data they accessed. For a more in-depth description of the disadvantages of object oriented databases in comparison to their relational counterparts you can read my article An Exploration of Object Oriented Database Management Systems.

  • Again AFAIK (having only played with OODBMS personally), there is relatively little portability across OODBMS systems; code written for one would be very expensive to adapt to another. Investing in the technology required one to make a risky bet on the vendor who supplied it. This created an environment where the object-relational vendors could prosper by offering only a subset of the features but the absolute assurance that they would be in business for years to come. In the XML DBMS world, on the other hand, all support roughly the same schema, query language, and API standards;

There are two points Mike is making here

  1. There is very little portability across OODBMS systems.
  2. In the XML DBMS world, on the other hand, all support roughly the same schema, query language, and API standards

Based on my experiences with OODBMSs the first claim is entirely accurate, moving data from one OODBMS system was a pain and there was a definitle lack of standardization of APIs and query languages across various products. The second claim is rather suspect to me. I am unaware of any schema, query or API standards that are supported uniformly across XML database products. This isn't to say there aren't standardized W3C branded XML schema languages or query languages nor that there haven't been moves to come up with standard XML database APIs but when last I looked these weren't uniformly supported across many the XML database products and where they were there was a distinct lack of maturity in their offerings. Granted it's been almost a year since  I last looked.

However there is an obvious point about portability that Mike doesn't mention (perhaps because it is so obvious). The entire point of XML is being portable and interoperability, moving data from one XML database to another should simply be a case of "export database as XML" from one and "import XML into database" on the other.

  • The standards of the XML world provide a clearly defined and fairly high bar for those who would seek to take away the market pioneered by the XML DBMS vendors. For better or worse, the XML family of specs is complex and quite challenging to support efficiently in a DBMS system. It's one thing to support, as the RDBMS vendors now do quite well, XML views of structured, typed, relatively "flat" data such as are typically found in RDBMS applications. It is quite another to efficiently and scalably support queries and updates on "document-like" XML with relatively open content models, lots of recursion, mixed content, and where wildcard text comparisions are more frequent than typed value comparisons. The dominant DBMS vendors obviously have talent and money to throw at the problem, but analysts should not assume that they will surpass theese capabilities of the XML DBMS systems anytime soon

OK, this one sounds like FUD. Basically Mike seems to be saying the family of XML specs is so complex (thanks to the W3C, but that's another story) that companies like Oracle, IBM and Microsoft won't be able to come up with ways to query semi-structured data efficiently or perform text comparison searches well so you are best of sticking to a seperate database for your XML data instead of having all your data stored in a single unified store.

So what is my position on the death of native XML databases? Like Phil Howard, I suspect that once XML support becomes [further] integrated into mainstream relational databases (which it  already has to some degree) then native XML databases will be hard pressed to come up with reasons why one would want to buy a separate product for storing XML data distinct from the rest of the data for a business when a traditional relational database can store it all. It's all about integration. Businesses prefer buying a single office productivity suite than mixing and matching word processors, spreadsheets and presentation programs from different vendors. I suspect the same is true when it comes to their data storage needs.


 

Categories: XML

October 29, 2003
@ 12:49 PM
Beware the Hoover Dustette
 

Categories: Ramblings

October 29, 2003
@ 04:33 AM

Get it here

Differences between v1.2.0.43 and v1.2.0.42 below

  • This primarily fixes complaints by Roy Osherove about the responsiveness of RSS Bandit. Eliminated a number of places where GUI locks up when performing tasks by using background threads. Also reduced number of threads used by thread pool when downloading feeds.

  • Added missing source files to installer package, enabling one to compile RSS Bandit from source if so desired.

  • Reverted to behavior where links are opened in a new tabbed browser pane as opposed to the feed display pane

  • Fixed: ArgumentNullException when attempt made to "Refresh Feed" when the top most 'My Feeds' node is selected in the tree view.

 


 

Categories: RSS Bandit

Randy Holloway writes

This session was set up as an open conversation, with only one concrete agenda item.  That being RSS versus Atom.

Interesting, a bunch of developers get together to discuss weblogging technologies and they discuss the most irrelevant piece of the puzzle. For those not keeping track, there are two primary weblog syndication formats in popular usage; the RDF-based RSS 1.0 and Dave Winer's RSS 0.91/RSS 2.0. Developers tend to prefer Dave Winer's specs to the RSS 1.0 branch but due to various interpersonal issues with Dave Winer (unsurprising since he can be quite trying) a bunch of people decided to create a third syndication format (Atom) which duplicates the functionality of the other two primarily to get around the fact that Dave Winer controlled the spec for the most popular feed syndication format. This third format adds little to the table besides fragmenting the feed syndication world which already has to deal with RSS 1.0 vs. RSS 0.91/RSS 2.0 issues. In fact, this redundancy is currently being debated on the atom-syntax list.

There are three major technologies (i.e. XML formats) in the blogging world; feed syndication (RSS), blog editing (Blogger API & MetaWeblog API) and feed list information (OPML). Dave Winer's specs are dominant in all three areas given that he is the author of both the MetaWeblog API and OPML spec. Of the three of the them, RSS is probably the best of the specs and meets the needs of most users except for the Semantic Web folks who want an RDF-based format (i.e. RSS 1.0). On the other hand, there are significant deficiencies in both the MetaWeblog API and OPML. I have blogged about What is wrong with the MetaWeblog API as well as mentioned some of the problems with OPML as a format for storing information about subscribed feeds.

Given that RSS is the best of the weblog related technologies while the blog editing and feed list formats are actually the technologies with problems one might wonder why there is so much energy invested in fixing what isn't that broken instead of trying to tackle actual problems that affect developers and users of blogging tools? One of the answers to this question comes from a comment that Randy Holloway says someone made during the Weblogging BOF

"We don't solve problems, we just talk about them." 

Most of the people engaged in the discussions don't actually write any code or at least not any weblogging related code, so they are unaware of the real problems but instead focus on simple yet irrelevant issues that are easy to grok. This is definitely a case of bike shedding. [Hmmm, I love the term "bike shedding" so much I dug up  the original source of the phrase]

Speaking of Atom, I'm curious as to how all the XML Web Services folks at the Weblogging BOF felt about the fact that the current drafts of the ATOM API uses just HTTP and XML instead of the XML Web Services buzzword soup (SOAP, WSDL, etc) meaning they won't be using Indigo to code against it just plain old System.Xml and System.Net. If ""XML-RPC is a fantastic solution... from a while ago"  I wonder what they think of using just HTTP with no fancy object<->XML mappings, positively prehistoric :)


 

Categories: Ramblings

October 28, 2003
@ 06:58 AM

I finally got around to providing a list of the major features of RSS Bandit on the RSS Bandit wiki this evening. I finally was motivated to do this after reading a blog post by Luke Huttemann, the author of SharpReader and the ensuing comments.Of the ten comments in response to his post about an upcoming release of SharpReader,  four of them were requests for features already in RSS Bandit (actually two of them actually mentioned RSS Bandit by name). I find it amusing that the requestors would rather wait for Luke to add the features to SharpReader than use RSS Bandit. That's definitely some brand loyalty at work. :)

Anyway, in the spirit of democracy there's another vote going on in the RSS Bandit workspace. This time it's to decide what features should be in the next release of RSS Bandit. It seems like structured search (mentioned in a previous blog entry) is high on everyone's priority list and will definitely make it into the next release. Actually the release after the next one since I plan to ship a bugfix release in the next couple of days to clean up some of the sloppiness in the last release since it was rushed to be in time for PDC.

One of the things that warms the cockles of my heart is seeing RSS Bandit mentioned in posts such as  The Magic of Blogging. Having some of the stuff you work on mentioned in the same breath as the word "magic" is pretty fucking cool. It's great to see regular people actually getting some use out software I helped build instead of just building plumbing for developers which is my day job.  

 


 

Categories: RSS Bandit

October 28, 2003
@ 05:36 AM

Clemens Vasters writes

Indigo is the successor technology and the consolidation of DCOM, COM+, Enterprise Services, Remoting, ASP.NET Web Services (ASMX), WSE, and the Microsoft Message Queue. It provides services for building distributed systems all the way from simplistic cross-appdomain message passing and ORPC to cross-platform, cross-organization, vastly distributed, service-oriented architectures providing reliable, secure, transactional, scalable and fast, online or offline, synchronous and asynchronous XML messaging.

I think is truly awesome, they (folks like Don Box, Doug Purdy, Steve Swartz, Scott Gellock, Omri Gazitt , Mike Vernal , John Lambert et al) have not just cooked up a brand new distributed computing platform but have built it on open standards and open technologies meaning that probably for the first time in decades there won't be artifical, politics induced divisions limiting a distributed computing technology to particular platforms or operating systems (i.e. like CORBA, DCOM & Java RMI). The extra goodness is that these open standards are all XML based so crazy XML geeks like me can do stuff like this or people like Sam Ruby can do stuff like that.

The next generation of DCOM, just that this time it interoperates with everyone regardless of what programming language or operating system they are running.

Fucking sweet.


 

Categories: Life in the B0rg Cube | XML

Since I'm not at the Microsoft Professional Developer's conference, I decided to answer questions by attendees about stuff that I am directly or indirectly responsible for right here. So let's roll the questions out

Alan Dean writes

    • You can leverage the XML support in .NET against a data source of your choice (for example, the Registry) by implementing a new XmlReader. We were pointed to work by Mark Fussell on MSDN to do this (Writing XML Providers for Microsoft .NET).

      In Whidbey we will be encouraging people to implement custom XPathNavigator instances instead of custom XmlReaders unless their situation specifically calls for forward-only processing. The ObjectXPathNavigator is an example of a custom navigator

    • Tim indicated that the whitespace handling had been particularly useful in the field. He mentioned a gotcha with empty elements; namely that this
      <elementName></elementName>
      is actually the same as this
         <elementName>
         </elementName>

      except that the second has been pretty-printed with a CRLF and some whitespace indentation. The only way to handle this correctly in your XmlReader, however, is to use _reader.WhitespaceHandling = WhitespaceHandling.None

      This is a legacy from what I like to call our "unconformant by default" era which is how we shipped the XML parser in v1.0 and v1.1. There were a couple of nice features like not erroring on invalid characters and the above feature that people needed in some cases but shouldn't have been the default behavior since it was extremely difficult to figure out how to turn of all the features and get a conformant XML 1.0 parser. In v2.0 we're going to a "conformant by default" mode where people have to go out of their way to read in unconformant XML not the other way around.

    • The current pain suffered by not being able to CreateElement independent of an XmlDocument which led to the ImportNode hack to allow movement between documents. They were sufficiently noisy on this to lead me to think that this is resolved in Whidbey.

      To my knowledge this behavior will remain the same in Whidbey.

Kirk Allen Evans writes

There will be another, more difficult, XML parser, and you will hear Mark Fussell talk about that later this week.“ - Don Box

I would hedge bets that this revolves around XQuery or something, I am looking forward to hearing what this is.

I am extremely curious about what this means myself but don't think Don was talking about XQuery. If anything he probably was talking about APIs but even then we aren't changing much from what we provided in v1 in the area of the XML parser (i.e. the XmlReader class) although the implementation has been rewritten to be faster and more conformant (much props to Helena). I am puzzled about what Don meant by that statement since I've seen Mark's slides and there really isn't anything about another XML parser that users have to learn. Perhaps he meant another XML API which is valid given that we did a ton of work on the XPathDocument for v2.0 which means there may be a lot of people moving from using XmlDocument to using XPathDocument for a number of reasons.

 


 

Categories: Life in the B0rg Cube