Wednesday, 29 October 2003 - Dare Obasanjo's weblog

October 29, 2003

@ 01:57 PM

A recent article by Phil Howard of Bloor Research on IT-Director.com talks about the Demise of the XML Database. Excerpts below

While you can still buy an XML database purely because it provides faster storage capability and greater functionality than a conventional database, all the erstwhile XML database vendors are increasingly turning to other sources of use for their products.

These other markets basically consist of two different sectors: the use of XML databases as a part of an integration strategy, where the database is used to provide on-the-fly translation for XML documents, and for content management...

The reason why there is this trend away from pure XML storage is because advanced XML capabilities are being introduced by all the leading relational vendors.

This has been considered "fighting words" from some in the XML database camp such as Mike Champion (works on Tamino XML database) and Kimbro Staken (one of the originators of Apache Xindice). Mike Champion comes up with a number of counter-arguments to the claims in the article I found interesting and felt compelled to comment on. According to Mike

It is widely believed that less than a quarter of enterprise data is currently stored in RDBMS systems. This suggests that the market is not "making do" with what the relational database products offer today, but using a wide variety of technologies.

This is actually the mantra of the team I work for at Microsoft. We are responsible for data access technologies (Relational, Object and XML) and our GM is fond of trotting out the quote about "less than a quarter of enterprise data is currently stored in a relational database". A lot of data important to businesses is just siting around on file systems in various Microsoft Office documents and other file formats. The bet across the software industry is that moving all this semi-structured business documents to XML is the right way to go and the first step has been achieved given that modern business productivity software (including the Open Source ones) are moving to fully supporting XML for their document formats. Step one is definitely to get all those memos, contracts and spreadsheets into XML.

The main reason OODBMS didn't hit the sweet spot, AFAIK, is that they created a tight coupling between application code and the DBMS. Potential performance gains this allows can outweigh the maintenance challenges in extremely business critical, high transaction volume environments...XML DBMS, on the other hand, inherit XML's suitability for loosely coupling systems, applications, and tools across a wide range of environments.

Totally agree here about the weakness of OODBMSs in creating a tight coupling between applications and the data they accessed. For a more in-depth description of the disadvantages of object oriented databases in comparison to their relational counterparts you can read my article An Exploration of Object Oriented Database Management Systems.

Again AFAIK (having only played with OODBMS personally), there is relatively little portability across OODBMS systems; code written for one would be very expensive to adapt to another. Investing in the technology required one to make a risky bet on the vendor who supplied it. This created an environment where the object-relational vendors could prosper by offering only a subset of the features but the absolute assurance that they would be in business for years to come. In the XML DBMS world, on the other hand, all support roughly the same schema, query language, and API standards;

There are two points Mike is making here

There is very little portability across OODBMS systems.
In the XML DBMS world, on the other hand, all support roughly the same schema, query language, and API standards

Based on my experiences with OODBMSs the first claim is entirely accurate, moving data from one OODBMS system was a pain and there was a definitle lack of standardization of APIs and query languages across various products. The second claim is rather suspect to me. I am unaware of any schema, query or API standards that are supported uniformly across XML database products. This isn't to say there aren't standardized W3C branded XML schema languages or query languages nor that there haven't been moves to come up with standard XML database APIs but when last I looked these weren't uniformly supported across many the XML database products and where they were there was a distinct lack of maturity in their offerings. Granted it's been almost a year since I last looked.

However there is an obvious point about portability that Mike doesn't mention (perhaps because it is so obvious). The entire point of XML is being portable and interoperability, moving data from one XML database to another should simply be a case of "export database as XML" from one and "import XML into database" on the other.

The standards of the XML world provide a clearly defined and fairly high bar for those who would seek to take away the market pioneered by the XML DBMS vendors. For better or worse, the XML family of specs is complex and quite challenging to support efficiently in a DBMS system. It's one thing to support, as the RDBMS vendors now do quite well, XML views of structured, typed, relatively "flat" data such as are typically found in RDBMS applications. It is quite another to efficiently and scalably support queries and updates on "document-like" XML with relatively open content models, lots of recursion, mixed content, and where wildcard text comparisions are more frequent than typed value comparisons. The dominant DBMS vendors obviously have talent and money to throw at the problem, but analysts should not assume that they will surpass theese capabilities of the XML DBMS systems anytime soon

OK, this one sounds like FUD. Basically Mike seems to be saying the family of XML specs is so complex (thanks to the W3C, but that's another story) that companies like Oracle, IBM and Microsoft won't be able to come up with ways to query semi-structured data efficiently or perform text comparison searches well so you are best of sticking to a seperate database for your XML data instead of having all your data stored in a single unified store.

So what is my position on the death of native XML databases? Like Phil Howard, I suspect that once XML support becomes [further] integrated into mainstream relational databases (which it already has to some degree) then native XML databases will be hard pressed to come up with reasons why one would want to buy a separate product for storing XML data distinct from the rest of the data for a business when a traditional relational database can store it all. It's all about integration. Businesses prefer buying a single office productivity suite than mixing and matching word processors, spreadsheets and presentation programs from different vendors. I suspect the same is true when it comes to their data storage needs.

Categories: XML

October 29, 2003

@ 12:49 PM

Comments [0]

Making Love To Vaccuum Cleaners

Beware the Hoover Dustette

Categories: Ramblings

October 29, 2003

@ 04:33 AM

Comments [16]

RSS Bandit 1.2.0.43 Released

Get it here

Differences between v1.2.0.43 and v1.2.0.42 below

This primarily fixes complaints by Roy Osherove about the responsiveness of RSS Bandit. Eliminated a number of places where GUI locks up when performing tasks by using background threads. Also reduced number of threads used by thread pool when downloading feeds.
Added missing source files to installer package, enabling one to compile RSS Bandit from source if so desired.
Reverted to behavior where links are opened in a new tabbed browser pane as opposed to the feed display pane
Fixed: ArgumentNullException when attempt made to "Refresh Feed" when the top most 'My Feeds' node is selected in the tree view.

Categories: RSS Bandit

October 28, 2003

@ 08:16 AM

Comments [3]

Weblogging Birds-Of-A-Feather At PDC

Randy Holloway writes

This session was set up as an open conversation, with only one concrete agenda item. That being RSS versus Atom.

Interesting, a bunch of developers get together to discuss weblogging technologies and they discuss the most irrelevant piece of the puzzle. For those not keeping track, there are two primary weblog syndication formats in popular usage; the RDF-based RSS 1.0 and Dave Winer's RSS 0.91/RSS 2.0. Developers tend to prefer Dave Winer's specs to the RSS 1.0 branch but due to various interpersonal issues with Dave Winer (unsurprising since he can be quite trying) a bunch of people decided to create a third syndication format (Atom) which duplicates the functionality of the other two primarily to get around the fact that Dave Winer controlled the spec for the most popular feed syndication format. This third format adds little to the table besides fragmenting the feed syndication world which already has to deal with RSS 1.0 vs. RSS 0.91/RSS 2.0 issues. In fact, this redundancy is currently being debated on the atom-syntax list.

There are three major technologies (i.e. XML formats) in the blogging world; feed syndication (RSS), blog editing (Blogger API & MetaWeblog API) and feed list information (OPML). Dave Winer's specs are dominant in all three areas given that he is the author of both the MetaWeblog API and OPML spec. Of the three of the them, RSS is probably the best of the specs and meets the needs of most users except for the Semantic Web folks who want an RDF-based format (i.e. RSS 1.0). On the other hand, there are significant deficiencies in both the MetaWeblog API and OPML. I have blogged about What is wrong with the MetaWeblog API as well as mentioned some of the problems with OPML as a format for storing information about subscribed feeds.

Given that RSS is the best of the weblog related technologies while the blog editing and feed list formats are actually the technologies with problems one might wonder why there is so much energy invested in fixing what isn't that broken instead of trying to tackle actual problems that affect developers and users of blogging tools? One of the answers to this question comes from a comment that Randy Holloway says someone made during the Weblogging BOF

"We don't solve problems, we just talk about them."

Most of the people engaged in the discussions don't actually write any code or at least not any weblogging related code, so they are unaware of the real problems but instead focus on simple yet irrelevant issues that are easy to grok. This is definitely a case of bike shedding. [Hmmm, I love the term "bike shedding" so much I dug up the original source of the phrase]

Speaking of Atom, I'm curious as to how all the XML Web Services folks at the Weblogging BOF felt about the fact that the current drafts of the ATOM API uses just HTTP and XML instead of the XML Web Services buzzword soup (SOAP, WSDL, etc) meaning they won't be using Indigo to code against it just plain old System.Xml and System.Net. If ""XML-RPC is a fantastic solution... from a while ago" I wonder what they think of using just HTTP with no fancy object<->XML mappings, positively prehistoric :)

Categories: Ramblings

October 28, 2003

@ 06:58 AM

Comments [2]

What Exactly Can RSS Bandit Do?

I finally got around to providing a list of the major features of RSS Bandit on the RSS Bandit wiki this evening. I finally was motivated to do this after reading a blog post by Luke Huttemann, the author of SharpReader and the ensuing comments.Of the ten comments in response to his post about an upcoming release of SharpReader, four of them were requests for features already in RSS Bandit (actually two of them actually mentioned RSS Bandit by name). I find it amusing that the requestors would rather wait for Luke to add the features to SharpReader than use RSS Bandit. That's definitely some brand loyalty at work. :)

Anyway, in the spirit of democracy there's another vote going on in the RSS Bandit workspace. This time it's to decide what features should be in the next release of RSS Bandit. It seems like structured search (mentioned in a previous blog entry) is high on everyone's priority list and will definitely make it into the next release. Actually the release after the next one since I plan to ship a bugfix release in the next couple of days to clean up some of the sloppiness in the last release since it was rushed to be in time for PDC.

One of the things that warms the cockles of my heart is seeing RSS Bandit mentioned in posts such as The Magic of Blogging. Having some of the stuff you work on mentioned in the same breath as the word "magic" is pretty fucking cool. It's great to see regular people actually getting some use out software I helped build instead of just building plumbing for developers which is my day job.

Categories: RSS Bandit

October 28, 2003

@ 05:36 AM

Comments [3]

Indigo

Clemens Vasters writes

Indigo is the successor technology and the consolidation of DCOM, COM+, Enterprise Services, Remoting, ASP.NET Web Services (ASMX), WSE, and the Microsoft Message Queue. It provides services for building distributed systems all the way from simplistic cross-appdomain message passing and ORPC to cross-platform, cross-organization, vastly distributed, service-oriented architectures providing reliable, secure, transactional, scalable and fast, online or offline, synchronous and asynchronous XML messaging.

I think is truly awesome, they (folks like Don Box, Doug Purdy, Steve Swartz, Scott Gellock, Omri Gazitt , Mike Vernal , John Lambert et al) have not just cooked up a brand new distributed computing platform but have built it on open standards and open technologies meaning that probably for the first time in decades there won't be artifical, politics induced divisions limiting a distributed computing technology to particular platforms or operating systems (i.e. like CORBA, DCOM & Java RMI). The extra goodness is that these open standards are all XML based so crazy XML geeks like me can do stuff like this or people like Sam Ruby can do stuff like that.

The next generation of DCOM, just that this time it interoperates with everyone regardless of what programming language or operating system they are running.

Fucking sweet.

Categories: Life in the B0rg Cube | XML

October 27, 2003

@ 06:54 PM

Comments [0]

Answers To Some Questions About XML in Whidbey

Since I'm not at the Microsoft Professional Developer's conference, I decided to answer questions by attendees about stuff that I am directly or indirectly responsible for right here. So let's roll the questions out

Alan Dean writes

You can leverage the XML support in .NET against a data source of your choice (for example, the Registry) by implementing a new XmlReader. We were pointed to work by Mark Fussell on MSDN to do this (Writing XML Providers for Microsoft .NET).

In Whidbey we will be encouraging people to implement custom XPathNavigator instances instead of custom XmlReaders unless their situation specifically calls for forward-only processing. The ObjectXPathNavigator is an example of a custom navigator
Tim indicated that the whitespace handling had been particularly useful in the field. He mentioned a gotcha with empty elements; namely that this
<elementName></elementName>
is actually the same as this
<elementName> </elementName>
except that the second has been pretty-printed with a CRLF and some whitespace indentation. The only way to handle this correctly in your XmlReader, however, is to use _reader.WhitespaceHandling = WhitespaceHandling.None

This is a legacy from what I like to call our "unconformant by default" era which is how we shipped the XML parser in v1.0 and v1.1. There were a couple of nice features like not erroring on invalid characters and the above feature that people needed in some cases but shouldn't have been the default behavior since it was extremely difficult to figure out how to turn of all the features and get a conformant XML 1.0 parser. In v2.0 we're going to a "conformant by default" mode where people have to go out of their way to read in unconformant XML not the other way around.
The current pain suffered by not being able to CreateElement independent of an XmlDocument which led to the ImportNode hack to allow movement between documents. They were sufficiently noisy on this to lead me to think that this is resolved in Whidbey.

To my knowledge this behavior will remain the same in Whidbey.

Kirk Allen Evans writes

There will be another, more difficult, XML parser, and you will hear Mark Fussell talk about that later this week.“ - Don Box

I would hedge bets that this revolves around XQuery or something, I am looking forward to hearing what this is.

I am extremely curious about what this means myself but don't think Don was talking about XQuery. If anything he probably was talking about APIs but even then we aren't changing much from what we provided in v1 in the area of the XML parser (i.e. the XmlReader class) although the implementation has been rewritten to be faster and more conformant (much props to Helena). I am puzzled about what Don meant by that statement since I've seen Mark's slides and there really isn't anything about another XML parser that users have to learn. Perhaps he meant another XML API which is valid given that we did a ton of work on the XPathDocument for v2.0 which means there may be a lot of people moving from using XmlDocument to using XPathDocument for a number of reasons.

Categories: Life in the B0rg Cube

October 27, 2003

@ 03:39 PM

Comments [1]

While The Cat's Away

So it looks like my boss, his boss, his boss's boss, and his boss's boss's boss are all out at the Microsoft Professional Developer's Conference 2003 (aka PDC) where folks will get a sneak peak at the next versions of Windows, SQL Server and Visual Studio. Thus it looks like won't be much whip cracking going on this week so I can spend time working on my pet projects for work.

XML Developer Center on MSDN: Mark Fussel recently posted complaints about the quality of some articles on XML he'd recently read. I generally feel the same way about websites dedicated to articles about XML. Of all the developer sites devoted to XML there are only two I've seen that aren't utter crap; XML.com and IBM's XML developerWorks site. Even these are kind of hit or miss, XML.com usually publishes about 3 articles a week of which one is excellent, one is good and one is crap. Which is fine except that the excellent article is typically about something that isn't directly applicable to what I work on. The problem with IBM's DeveloperWorks is that all the code is Java-centric which doesn't help me since I work with the .NET Framework.
After seeing some of what Tim Ewald did with producing content around Microsoft technologies and XML Web Services via the Web Services Developer Center on MSDN I talked to some of the folks at MSDN about creating something similar for XML content. This was green lighted a while ago but preparations for PDC has stopped this from taking off until next month. In the meantime, I'll be creating my content plan and coming up with a list of authors (both Microsoft employees and non-Microsoft folks) for new dev center.

So far I've gotten a couple of folks lined up internally as well as some excellent non-Microsoft folks like Daniel Cazzulino, Christoph Schittko and Oleg Tkachenko. Definitely expect some pages to the XML Home Page on MSDN in the next few months.
Sequential XPath and Pull Based XML Parsing: In 2001, Arpan Desai presented on Sequential XPath at XML 2001. Relevant bits from the paper

This paper will provide an explanation of and the subset of XPath which we will tentatively dub: Sequential XPath, or SXPath for ease of use. SXPath allows a event-based XML parser, such as a typical SAX-compliant XML parser, to execute XPath-like expressions without the need of more memory consumption than is normally used within a ~~sequential~~ pull-based parser.
...
By creating a streaming XML parser which utilizes Sequential XPath, one is able to reap the inherent benefits of a streaming parser with the querying power of XPath. By defining this proper subset of XPath, we enable developers and users to utilize XML in a wide array of applications thought to be too performance sensitive for traditional XML processing.
The code for the technology outlined above has actually been gathering dust on some hard drives at work for a while. I'm currently in the process of liberating this code so that everyone can get access to the combined benefits of pull-based parsing and XPath based matching of nodes. Hopefully folks should be able to download classes similar to the ones outlined in Arpan's presentation in the next few weeks. Hopefully by Christmas, everyone will be able to write code similar to the following snippet taken from Tim Bray's XML is too Hard for Programmers

while (<STDIN>) {
  next if (X<meta>X);
  if    (X<h1>|<h2>|<h3>|<h4>X)
  { $divert = 'head'; }
  elsif (X<img src="/^(.*\.jpg)$/i>X)
  { &proc_jpeg($1); }
  # and so on...
}
Of course you'll have to substitute the Perl code above for C#, VB.NET or any one the various languages targetted at the .NET Framework.

Categories: XML

October 25, 2003

@ 05:47 PM

Comments [5]

RSS Bandit 1.2.0.42 Released

Get it here

Differences between v1.1.0.36 and v1.2.0.42 below.

Support for password protected feeds using either HTTPS/SSL or HTTP Authentication. This feature can be tested using Steven Garrity's test feeds.
The ability to store and retrieve feed list from remote locations such as a dasBlog blog, an FTP server or a network file share. This enables users utilizing RSS aggregators on multiple machines to synchronize their feed list from a single point. This feature has been called a subscription harmonizer by some.
Multiple feeds downloaded simultaneously instead of one at a time thus reducing download time.
When saving as OPML, the hierarchy of the feed list is preserved instead of writing out a flat structure.
Default theme for viewing items changed to resemble that of a mail reader like Outlook Express.
Added support for <dc:author> and <author> elements to a number of templates including the default theme.
FIXED: Feed list corruption when importing an OPML file where xmlUrl="" for some feeds
FIXED: NullReferenceException involving streams when accessing feeds after RSS Bandit has been running for a long time.

Categories: RSS Bandit

October 25, 2003

@ 02:13 PM

Comments [1]

Something Cool From Microsoft You Might Not See At The PDC

"This paper proposes extending popular object-oriented programming languages such as C#, VB or Java with native support for XML. In our approach XML documents or document fragments become first class citizens. This means that XML values can be constructed, loaded, passed, transformed and updated in a type-safe manner. The type system extensions, however, are not based on XML Schemas. We show that XSDs and the XML data model do not fit well with the class-based nominal type system and object graph representation of our target languages. Instead we propose to extend the C# type system with new structural types that model XSD sequences, choices, and all-groups. We also propose a number of extensions to the language itself that incorporate a simple but expressive query language that is influenced by XPath and SQL. We demonstrate our language and type system by translating a selection of the XQuery use cases."

From Programming with Rectangles, Triangles, and Circles by Erik Meijer and Wolfram Schulte

I talk to Erik about this stuff all the time, so it's great to finally see some of the thoughts and discussions around this topic actually written down in a research paper. According to Erik's blog post from a few weeks ago he'll actually be presenting about this at XML 2003

Categories: XML

Dare Obasanjo's weblog

"You can buy cars but you can't buy respect in the hood" - Curtis Jackson

Navigation for Wednesday, 29 October 2003 - Dare Obasanjo's weblog