XML For You and Me, Your Mama and Your Cousin Too

January 6, 2004

@ 04:17 PM

Jon Udell recently wrote in his post entitled XML for the rest of us

By the way, Adam Bosworth said a great many other interesting things in his XML 2003 talk. For those of you not inclined to watch this QuickTime clip -- and in particular for the search crawlers -- I would like to enter the following quote into the public record.

The reason people get scared of queries is that it's hard to say 'You can send me this kind of query, but not that kind of query.' And therefore it's hard to have control, and people end up building other systems. It's not clear that you always want query. Sometimes people can't handle arbitrary statements. But we never have queries. I don't have a way to walk up to Salesforce and Siebel and say tell me everything I know about the customer -- in the same way. I don't even have a way to say tell me everything about the customers who meet the following criteria. I don't have a way to walk up to Amazon and Barnes and Noble and in a consistent way say 'Find me all the books reviewed by this person,' or even, 'Find me the reviews for this book.' I can do that for both, but not in the same way. We don't have an information model. We don't have a query model. And for that, if you remember the dream we started with, we should be ashamed.

I think we can fix this. I think we can take us back to a world that's a simple world. I think we can go back to a world where there are just XML messages flowing back and forth between...resources. <snipped />

Three things jump out at me from that passage. First, the emphasis on XML query. My instincts have been leading me in that direction for a while now, and much of my own R&D in 2003 was driven by a realization that XPath is now a ubiquitous technology with huge untapped potential. Now, of course, XQuery is coming on like a freight train.

When Don and I hung out over the holidays this was one of the things we talked about. Jon's post has been sitting flagged for follow up in my aggregator for a while. Here are my thoughts...

The main problem is that there are a number of websites which have the same information but do not provide a uniform way to access this information and when access mechanisms to information are provided do not allow ad-hoc queries. So the first thing that is needed is a shared view (or schema) of what this information looks like which is the shared information model Adam talks about. There are two routes you can take with this, one is to define a shared data model with the transfer syntax being secondary (i.e. use RDF) while another is to define a shared data model and transfer syntax (i.e use XML). In most cases, people have tended to pick the latter.

Once an XML representation of the relevant information users are interested has been designed (i.e. the XML schema for books, reviews and wishlists that could be exposed by sites like Amazon or Barnes & Nobles) the next technical problem to be solved is uniform access mechanisms. The eternal REST vs. SOAP vs. XML-RPC that has plagued a number of online discussions. Then there's deployment, adoption and evangelism.

Besides the fact that I've glossed over the significant political and business reasons that may or may not make such an endeavor fruitful we still haven't gotten to Adam's Nirvana. We still need a way to process the data exposed by these web services in arbitrary ways. How does one express a query such as "Find all the CDs released between 1990 and 1999 that Dare Obasanjo rated higher than 3 stars"? Given the size of the databases hosted by such sites would it make more sense to ship the documents to the client or some mid-tier which then performs the post-processing of the raw data instead of pushing such queries down to the database? What are the performance ramifications of exposing your database to anyone with a web browser and allowing them to run ad-hoc queries instead of specially optimized, canned queries?

At this point if you are like me you might suspect that defining that the web service endpoints return the results of performing canned queries which can then be post processed by the client may be more practical then expecting to be able to ship arbitrary SQL/XML, XQuery or XPath queries to web service end points.

The main problem with what I've described is that it takes a lot of effort. Coming up with standardized schema(s) and distributed computing architecture for a particular industry then driving adoption is hard even when there's lots of cooperation let alone in highly competitive markets.

In an ideal world, this degree of boot strapping would be unnecessary. After all, people can already build the kinds of applications Adam described today by screen scraping [X]HTML although they tend to be brittle. What the software industry should strive for is a way to build such applications in a similarly loosely connected manner in the XML Web Services world without requiring the heavy investment of human organizational effort that is currently needed. This was the initial promise of XML Web Services which like Adam I am ashamed has not come to pass. Instead many seem to be satisfied with reinventing DCOM/CORBA/RMI with angle-brackets (then replacing it with "binary infosets"). Unfortunate...

Categories: XML

« Forward Motion | Home | XQuery on the Web »

Tuesday, 06 January 2004 19:17:29 (GMT Standard Time, UTC+00:00)

My thoughts on what I call "WebXQuery" and what it takes to combine XQuery and the Web are at http://www.pacificspirit.com/blog/2004/01/06/WebXQuery

David Orchard

Friday, 16 January 2004 08:04:11 (GMT Standard Time, UTC+00:00)

Dare, there's already a solution to this (which Adam created at MS five years ago) -- virtual XML views to unify different data sources. So Amazon and BN and every other bookseller comes up with their own XML format. Somebody else comes along and creates a universal "bookstore" schema and maps each of them to it using an XML view. No loss of performance in smart XML Query implementations.

And if that universal schema becomes widely adopted, then eventually all the booksellers adopt it and the virtual XML views can go away. I think eventually you'll get this for documents, where instead of translating WordML to XHTML (as Don is doing), you create a universal document schema and map both WordML and XHTML into it. (And if the mappings are reversible, then you get your translators for free.)

Just my $0.02.

Michael Brundage

Comments are closed.

Dare Obasanjo's weblog

"You can buy cars but you can't buy respect in the hood" - Curtis Jackson

Navigation for XML For You and Me, Your Mama and Your Cousin Too - Dare Obasanjo's weblog