The Unified Theory of Everything - Dare Obasanjo's weblog

January 1, 2004

@ 10:51 AM

Sean Campbell or Scott Swigart writes

I want this also. I want a theory that unifies objects and data. We're not there yet.

With a relational database, you have data and relationships, but no objects. If you want objects, that's your problem, and the problem isn't insignificant. There’s been a parade of tools and technologies, and all of them have fallen short on the promise of bridging the gap. There's the DataSet, which seeks to be one bucket for all data. It's an object, but it doesn't give you an object view of the actual data. It leaves you doing things like ds.Tables["Customer"].Rows[0]["FirstName"].ToString(). Yuck. Then there are Typed DataSets. These give you a pseudo-object view of the data, letting you do: ds.Customer[0].FirstName. Better, but still not what I really want. And it's just code-gen on top of the DataSet. There's no real "Customer" object here.

Then, there are ObjectSpaces that let you do the XSD three-step to map classes to relational data in the database. With ObjectSpaces you get real, bona fide objects. However, this is just a bunch of goo piled on top of ADO.NET, and I question the scalability of this approach.

Then there are UDTs. In this case, you've got objects all the way into the database itself, with the object serialized as one big blob into a single column. To find specific objects, you have to index the properties that you care about, otherwise you're looking at not only a table scan, but rehydrating every row into an object to see if it's the object you're looking for.

There's always straight XML, but at this point you're essentially saying, "There are no objects". You have data, and you have schema. If you're seeing objects, it's just an optical illusion on top of the angle brackets. In fact, with Web services, it's emphatically stated that you're not transporting objects, you're transporting data. If that data happens to be the serialization of some object, that's nice, but don't assume for one second that that object will exists on the other end of the wire.

And speaking of XML, Yukon can store XML as XML. Which is to say you have semi-structured data, as XML, stored relationally, which you could probably map to an XML property of an object with ObjectSpaces.

What happens when worlds collide? Will ObjectSpaces work with Yukon UDTs and XML?

Oh, and don't forget XML Views, which let you view your relational data as XML on the client, even though it's really relational.

<snip />

So for a given scenario, do all of you know which technology to pick? I'm not too proud to admit that honestly I don't. In fact, I honestly don't know if I'll have time to stress test every one of these against a number of real problem domains and real data. And something tells me that if you pick the wrong tool for the job, and it doesn't pan out, you could be pretty hosed.

Today we have a different theory for everything. I want the Theory of Everything.

I've written about this problem in the past although at the time I didn't have a name for the Theory of Everything, now I do. From my previous post entitled Dealing with the Data Access Impedance Mismatch I wrote

The team I work for deals with data access technologies (relational, object, XML aka ROX) so this impedance mismatch is something that we have to rationalize all the time.

Up until quite recently the primary impedance mismatch application developers had to deal with was the Object<->Relational impedance mismatch. Usually data was stored in a relational database but primarily accessed, manipulated and transmitted over the network as objects via some object oriented programming language. Many felt (and still feel) that this impedance mismatch is a significant problem. Attempts to reduce this impedance mismatch has lead to technologies such as object oriented databases and various object relational mapping tools. These solutions take the point of view that the problem of having developers deal with two domains or having two sets of developers (DB developers and application coders) are solved by making everything look like a single domain, objects. One could also argue that the flip side of this is to push as much data manipulation as you can to the database via technologies like stored procedures while mainly manipulating and transmitting the data on the wire in objects that closely model the relational database such as the .NET Framework's DataSet class.

Recently a third player has appeared on the scene, XML. It is becoming more common for data to be stored in a relational database, mainly manipulated as objects but transmitted on the wire as XML. One would then think that given the previously stated impedance mismatch and the fact that XML is mainly just a syntactic device that XML representations of the data being transmitted is sent as serialized versions of objects, relational data or some subset of both. However, what seems to be happening is slightly more complicated. The software world seems to moving more towards using XML Web Services built on standard technologies such as HTTP, XML, SOAP and WSDL to transmit data between applications. And taken from the WSDL 1.1 W3C Note

WSDL recognizes the need for rich type systems for describing message formats, and supports the XML Schemas specification (XSD) [11] as its canonical type system

So this introduces a third type system into the mix, W3C XML Schema structures and datatypes. W3C XML Schema has a number of concepts that do not map to concepts in either the object oriented or relational models. To properly access and manipulate XML typed using W3C XML Schema you need new data access mechanisms such as XQuery. Now application developers have to deal with 3 domains or we need 3 sets of developers. The first instinct is to continue with the meme where you make everything look like objects which is what a number of XML Web Services toolkits do today including Microsoft's .NET Framework via the XML Serialization technology. This tends to be particularly lossy because traditionally object oriented systems do not have the richness to describe the constraints that are possible to create with a typical relational database let alone the even richer constraints that are possible with W3C XML Schema. Thus such object oriented systems must evolve to not only capture the semantics of the relational model but those of the W3C XML Schema model as well. Another approach could be to make everything look like XML and use that as the primary data access mechanism. Technologies already exist to make relational databases look like XML and make objects look like XML. Unsurprisingly to those who know me, this is the approach I favor. The relational model can also be viewed as a universal data access mechanism if one figured out how to map the constraints of the W3C XML Schema model. The .NET Framework's DataSet already does some translation of an XML structure defined in a W3C XML Schema to a relational structure.

The problem with all three approaches I just described is that they are somewhat lossy or involve hacking one model into becoming the uber-model. XML trees don't handle the graph structures of objects well, objects can't handle concepts like W3C XML Schema's derivation by restriction and so on. There is also a fourth approach which is endorsed by Erik Meijer in his paper Unifying Tables, Objects, and Documents where one creates a new unified model which is a superset of the pertinent features of the 3 existing models. Of course, this involves introducing a fourth model.

The fourth model mentioned above is the unified theory of everything that Scott or Sean is asking for. Since the last time I made this post, my friend Erik Meijer has been busy and produced another paper that shows what such a unification of the ROX triangle would look like if practically implemented as a programming language in his paper Programming with Circles, Triangles and Rectangles. In this paper Erik describes the research language Xen which seems to be the nirvana Scott or Sean is looking for. However this is a research project and not something Sean or Scott will be likely to use in production in the next year.

The main problem is that Microsoft has provided .NET developers with too much choice when it comes to building apps that retrieve data from a relational store, manipulate the data in memory then either push the updated information back to the store or send it over the wire. The one thing I have learned working as a PM on core platform technologies is that our customers HATE choice. It means having to learn multiple technologies and make decisions on which is the best, sometimes risking making the wrong choice. This is exactly the problem Scott or Sean is having with the technologies we announced at the recent Microsoft Professional Developer Conference (PDC) which will should be shiping this year. What technology should I use and when I should I use it?

This is something the folks on my team (WebData - the data access technology team) know we have to deal with when all this stuff ships later this year which we will deal with to the best of our ability. Our users want architectural guidance and best practices which we'll endeavor to make available as soon as possible.

The first step in providing this information to our users are the presentations and whitepaper we made available after PDC, Data Access Design Patterns: Navigating the Data Access Maze (Powerpoint slides) and Data Access Support in Visual Studio.NET code named “Whidbey”. Hopefully this will provide Sean, Scott and the rest of our data access customers with some of the guidance needed to make the right choice. Any feedback on the slides or document would be appreciated. Follow up documents should show up on MSDN in the next few months.

Categories: Technology | XML

« I Want This T-Shirt | Home | Why I Hate Physics »

Friday, 02 January 2004 22:24:56 (GMT Standard Time, UTC+00:00)

Unifying theory: Data is abstract, encoded data has been constrained. Understanding the constraints of the encodings and how they affect the impedance of utilizing the data should be the primary factor in deciding which encoding to use and when. It is not required that a solution utilize only a single encoding.

The linked whitepaper only told me to use the encoding with which I am most familiar. Not terribly useful. It would have been nice to understand how the constraints of the encodings affect the impedances imposed by the various boundaries created by abstractions, architectures, and usage scenarios. I don't think the problem is too much choice, but a lack of understanding the effects of utilizing the various encodings. Its nice to have the multiple abstractions, but sometimes the user needs to understand the realities of why there is more than one possible encoding of their data. (Imagine trying to hang a picture if you didn't understand the differences between a sledge hammer and a claw hammer.)

Mike Julier

Saturday, 03 January 2004 16:47:18 (GMT Standard Time, UTC+00:00)

Another aspect of the problem is the integration with tools. It seems architecture is usually more or less a generation ahead of the tools. In a sense a suboptimal architecture but with a very deep tooling support is preferable (from a "typical" customer pov).

The unified theory of everything is a great goal, but should be pursied with quantum leaps, not in a continum. On every step the tools, the litterature, the samples and the current architecture should be synchronized. Currently I find MS a bit lacking in this respect.

lionel

Comments are closed.

Dare Obasanjo's weblog

"You can buy cars but you can't buy respect in the hood" - Curtis Jackson

Navigation for The Unified Theory of Everything - Dare Obasanjo's weblog