October 1, 2003
@ 12:58 AM
Dealing with the Data Access Impedance Mismatch

Thanks to Erik Meijer for pointing me to The Impedance Imperative Tuples + Objects + Infosets =Too Much Stuff! article. The team I work for deals with data access technologies (relational, object, XML aka ROX) so this impedance mismatch is something that we have to rationalize all the time.

Up until quite recently the primary impedance mismatch application developers had to deal with was the Object<->Relational impedance mismatch. Usually data was stored in a relational database but primarily accessed, manipulated and transmitted over the network as objects via some object oriented programming language. Many felt (and still feel) that this impedance mismatch is a significant problem. Attempts to reduce this impedance mismatch has lead to technologies such as object oriented databases and various object relational mapping tools. These solutions take the point of view that the problem of having developers deal with two domains or having two sets of developers (DB developers and application coders) are solved by making everything look like a single domain, objects. One could also argue that the flip side of this is to push as much data manipulation as you can to the database via technologies like stored procedures while mainly manipulating and transmitting the data on the wire in objects that closely model the relational database such as the .NET Framework's DataSet class.

Recently a third player has appeared on the scene, XML. It is becoming more common for data to be stored in a relational database, mainly manipulated as objects but transmitted on the wire as XML. One would then think that given the previously stated impedance mismatch and the fact that XML is mainly just a syntactic device that XML representations of the data being transmitted is sent as serialized versions of objects, relational data or some subset of both. However, what seems to be happening is slightly more complicated. The software world seems to moving more towards using XML Web Services built on standard technologies such as HTTP, XML, SOAP and WSDL to transmit data between applications. And taken from the WSDL 1.1 W3C Note
WSDL recognizes the need for rich type systems for describing message formats, and supports the XML Schemas specification (XSD) [11] as its canonical type system
So this introduces a third type system into the mix, W3C XML Schema structures and datatypes. W3C XML Schema has a number of concepts that do not map to concepts in either the object oriented or relational models. To properly access and manipulate XML typed using W3C XML Schema you need new data access mechanisms such as XQuery. Now application developers have to deal with 3 domains or we need 3 sets of developers. The first instinct is to continue with the meme where you make everything look like objects which is what a number of XML Web Services toolkits do today including Microsoft's .NET Framework via the XML Serialization technology. This tends to be particularly lossy because traditionally object oriented systems do not have the richness to describe the constraints that are possible to create with a typical relational database let alone the even richer constraints that are possible with W3C XML Schema. Thus such object oriented systems must evolve to not only capture the semantics of the relational model but those of the W3C XML Schema model as well. Another approach could be to make everything look like XML and use that as the primary data access mechanism. Technologies already exist to make relational databases look like XML and make objects look like XML. Unsurprisingly to those who know me, this is the approach I favor. The relational model can also be viewed as a universal data access mechanism if one figured out how to map the constraints of the W3C XML Schema model. The .NET Framework's DataSet already does some translation of an XML structure defined in a W3C XML Schema to a relational structure.

The problem with all three approaches I just described is that they are somewhat lossy or involve hacking one model into becoming the uber-model. XML trees don't handle the graph structures of objects well, objects can't handle concepts like W3C XML Schema's derivation by restriction and so on. There is also a fourth approach which is endorsed by Erik Meijer in his paper Unifying Tables, Objects, and Documents where one creates a new unified model which is a superset of the pertinent features of the 3 existing models. Of course, this involves introducing a fourth model.

If you are interested in which approach(es) we decided to take on my team then you should be at PDC. [I had more to write but I'll be late for a meeting if I keep this up]


Don't Get Too Excited

Fumiaki writes
PDC is coming closer. We are all excited about what will be shown there. But remember, PDC is for future.

Anyone remember PDC 2000? The bits were still young there. We used webserviceutil.exe and DataSetCommand. VB.NET was not like the one we use today. Knowledge we got from the PDC 2000 was not useful in the real life 2000, and most of 2001, although today the knowledge is the advantage for us. PDC 2003 will be the same. ...

So, I would like to ask speakers a favor. Please tell us more of why you made it that way, than what you made. We will eventually gather information about the new bits from books, MSDN, and so on. Attending PDC should be our advantage because we will almost exclusively know why the features are there, why smart people at Microsoft decide its architecture that way. It is that kind of knowledge that will be our real advantage. That is why I am going, even it takes 10 hours to L.A. from Japan.
I have to agree with him here. The bits you'll get at PDC will most likely change before the final versions ship. There is anything from 1 - 3 years from now until some of the stuff ships which is a long time in software development. There are already a number of changes in the bits I own from what PDC folks will get at the code that is currently checked in, no changes in logical functionality but class renamings, API refactorings and the like. As time goes on I expect there to be more changes so the key thing of value folks should be trying to get out of being at PDC is the main concepts and functionality not focus on nitty gritty issues about APIs (although we want your feedback if something is broken) or specifc details about features. I was inspired to write the entry above by Fumiaki's statement that the why is more important than the what. I'll be helping with some PDC presentations even though I won't be there and will make sure this at least permeates the stuff around data access and XML.

Also, next month's issue of XML Journal should have an article by me which discusses some of the thought process that went into the improvements we made to some of the core XML APIs in the .NET Framework.


Stock Options vs. Stock Grants

The guy over at Corp Law Blog has an entry entitled Greatest IPO Ever where he links to a number of posts he's made about Microsoft's plan to replace options grants with actual stock. Informative stuff if you are a Microsoft employee or interested in the details of stock options and the like.


Get yourself a News Aggregator and subscribe to my RSSfeed

Disclaimer: The above comments do not represent the thoughts, intentions, plans or strategies of my employer. They are solely my opinion.