October 29, 2003
@ 01:57 PM

A recent article by Phil Howard of Bloor Research on IT-Director.com talks about the Demise of the XML Database. Excerpts below

While you can still buy an XML database purely because it provides faster storage capability and greater functionality than a conventional database, all the erstwhile XML database vendors are increasingly turning to other sources of use for their products.

These other markets basically consist of two different sectors: the use of XML databases as a part of an integration strategy, where the database is used to provide on-the-fly translation for XML documents, and for content management...

The reason why there is this trend away from pure XML storage is because advanced XML capabilities are being introduced by all the leading relational vendors. 

This has been considered "fighting words" from some in the XML database camp such as Mike Champion (works on Tamino XML database) and Kimbro Staken (one of the originators of Apache Xindice). Mike Champion comes up with a number of counter-arguments to the claims in the article I found interesting and felt compelled to comment on. According to Mike

  • It is widely believed that less than a quarter of enterprise data is currently stored in RDBMS systems. This suggests that the market is not "making do" with what the relational database products offer today, but using a wide variety of technologies.

This is actually the mantra of the team I work for at Microsoft. We are responsible for data access technologies (Relational, Object and XML) and our GM is fond of trotting out the quote about "less than a quarter of enterprise data is currently stored in a relational database". A lot of data important to businesses is just siting around on file systems in various Microsoft Office documents and other file formats. The bet across the software industry is that moving all this semi-structured business documents to XML is the right way to go and the first step has been achieved given that modern business productivity software (including the Open Source ones) are moving to fully supporting XML for their document formats. Step one is definitely to get all those memos, contracts and spreadsheets into XML.

  • The main reason OODBMS didn't hit the sweet spot, AFAIK, is that they created a tight coupling between application code and the DBMS. Potential performance gains this allows can outweigh the maintenance challenges in extremely business critical, high transaction volume environments...XML DBMS, on the other hand, inherit XML's suitability for loosely coupling systems, applications, and tools across a wide range of environments.

Totally agree here about the weakness of OODBMSs in creating a tight coupling between applications and the data they accessed. For a more in-depth description of the disadvantages of object oriented databases in comparison to their relational counterparts you can read my article An Exploration of Object Oriented Database Management Systems.

  • Again AFAIK (having only played with OODBMS personally), there is relatively little portability across OODBMS systems; code written for one would be very expensive to adapt to another. Investing in the technology required one to make a risky bet on the vendor who supplied it. This created an environment where the object-relational vendors could prosper by offering only a subset of the features but the absolute assurance that they would be in business for years to come. In the XML DBMS world, on the other hand, all support roughly the same schema, query language, and API standards;

There are two points Mike is making here

  1. There is very little portability across OODBMS systems.
  2. In the XML DBMS world, on the other hand, all support roughly the same schema, query language, and API standards

Based on my experiences with OODBMSs the first claim is entirely accurate, moving data from one OODBMS system was a pain and there was a definitle lack of standardization of APIs and query languages across various products. The second claim is rather suspect to me. I am unaware of any schema, query or API standards that are supported uniformly across XML database products. This isn't to say there aren't standardized W3C branded XML schema languages or query languages nor that there haven't been moves to come up with standard XML database APIs but when last I looked these weren't uniformly supported across many the XML database products and where they were there was a distinct lack of maturity in their offerings. Granted it's been almost a year since  I last looked.

However there is an obvious point about portability that Mike doesn't mention (perhaps because it is so obvious). The entire point of XML is being portable and interoperability, moving data from one XML database to another should simply be a case of "export database as XML" from one and "import XML into database" on the other.

  • The standards of the XML world provide a clearly defined and fairly high bar for those who would seek to take away the market pioneered by the XML DBMS vendors. For better or worse, the XML family of specs is complex and quite challenging to support efficiently in a DBMS system. It's one thing to support, as the RDBMS vendors now do quite well, XML views of structured, typed, relatively "flat" data such as are typically found in RDBMS applications. It is quite another to efficiently and scalably support queries and updates on "document-like" XML with relatively open content models, lots of recursion, mixed content, and where wildcard text comparisions are more frequent than typed value comparisons. The dominant DBMS vendors obviously have talent and money to throw at the problem, but analysts should not assume that they will surpass theese capabilities of the XML DBMS systems anytime soon

OK, this one sounds like FUD. Basically Mike seems to be saying the family of XML specs is so complex (thanks to the W3C, but that's another story) that companies like Oracle, IBM and Microsoft won't be able to come up with ways to query semi-structured data efficiently or perform text comparison searches well so you are best of sticking to a seperate database for your XML data instead of having all your data stored in a single unified store.

So what is my position on the death of native XML databases? Like Phil Howard, I suspect that once XML support becomes [further] integrated into mainstream relational databases (which it  already has to some degree) then native XML databases will be hard pressed to come up with reasons why one would want to buy a separate product for storing XML data distinct from the rest of the data for a business when a traditional relational database can store it all. It's all about integration. Businesses prefer buying a single office productivity suite than mixing and matching word processors, spreadsheets and presentation programs from different vendors. I suspect the same is true when it comes to their data storage needs.


 

Wednesday, 29 October 2003 17:55:27 (GMT Standard Time, UTC+00:00)
"companies like Oracle, IBM and Microsoft won't be able to come up with ways to query semi-structured data efficiently or perform text comparison searches well so you are best of sticking to a seperate database for your XML data instead of having all your data stored in a single unified store. "

I wouldn't put it that way. My point was that fully supporting all the features of content-oriented XML has been a challenge for my employer, and that's with a 30+ year history of building a DBMS product line (Adabas) that has industrial-strength "semistructed" and text data capabliities. I have no doubt that you folks can do a good job of it once you set your minds to it, but I'm suggesting that customers believe that when Yukon (or Son of Yukon, or whatever) ships and is provably as good as or better than the NXDB products ... and not when it's said to be shipping Real Soon Now. Some of us remember that Cairo was going to do all sorts of wonderful stuff, shipping in 1993 or so. :-)

Also, the "single unified store" is the crux of the issue that I see here. Those who have a strong business need for a unified object-relational-XML store will be (and probably should be!) coming to MS, IBM, or Oracle. It's the projects that ONLY need an XML store that companies such as ours target. The last I checked, the Tamino footprint is about 1/15th that of Oracle's, and it probably takes about 1/15th the administrative overhead too. That doesn't matter in an enterprise data center that has all the memory, disk, and DBAs it needs to run a large scale ORDBMS, but it matters in situations where there is a lot of data, it's all XML, performance and reliability are critical, but you can't put it in a "glass room." Pure XML DBMS will be a "niche", the only question is how big that niche is going to be once MS and other companies help people generate exabytes of XML yearly. I believe that a lot more than 20% of it will be in ORXDBMS, but I don't believe the figure will approach 100%.
Thursday, 30 October 2003 17:59:13 (GMT Standard Time, UTC+00:00)
Hybrid Databases (mulit model) are what enterprises will always favour due to the inherent heterogeneity of enterprise data. This is basically what <a href="http://www.openlinksw.com/virtuoso">OpenLink Virtuoso </a> is today, and it is where Oracle, DB2, and Microsoft (via Yukon / WinFS) are all headed. Thus, data Storage Model specificity (Object, Hierarchical, Directed Graph etc.) will never attain enterprise level longevity hence the perceived demise of the XML Database.

If we look at Data, Information (data with context), and knowledge (actionable data with context) as three seperate things, the value of the hybrid model database (Virtual Database in our parlance) becomes clearer. The hybrid database can actually act as an openly accessible repository for data, information, and knowledge leveraging the appropriate query language for each data context SQLX (SQL and XML), XQuery, XPath etc. . A model specific database is simply data context specific.

BTW - I like the RSS Bandit's CommentAPI support! It makes posting responses like this much easier. Ditto the new post association feature which creates a threaded view of related posts and comments.

Comments are closed.