Last week Andrew Conrad told me to check out a recent article by Adam Bosworth in the ACM Queue because he wondered what I thought about. I was rather embarassed to note that althought I'd seen some mention of it online, I hadn't read it. I read it today and as usual, Adam Bosworth is on point.

The article is entitled Learning from THE WEB and it begins by listing eight "unintuitive lessons" we have learned from the Web. The lessons are listed below

  1. Simple, relaxed, sloppily extensible text formats and protocols often work better than complex and efficient binary ones.

  2. It is worth making things simple enough that one can harness Moore’s law in parallel.

  3. It is acceptable to be stale much of the time.

  4. The wisdom of crowds works amazingly well.

  5. People understand a graph composed of tree-like documents (HTML) related by links (URLs).

  6. Pay attention to physics.

  7. Be as loosely coupled as possible.

  8. KISS. Keep it (the design) simple and stupid.

Where the paper gets interesting is that then tries to apply these lessons to XML. Remember that Adam was one of the founder of the XML team at Microsoft and knows a thing or two about it. So he writes

In my humble opinion, however, we ignored or forgot lessons 3, 4, and 5. Lesson 3 tells us that elements in XML with values that are unlikely to change for some known period of time (or where it is acceptable that they are stale for that period of time, such as the title of a book) should be marked to say this. XML has no such model.
Lesson 4 says that we shouldn’t over-invest in making schemas universally understood.
Lessons 1 and 5 tell us that XML should be easy to understand without schemas

I totally agree with his assessment of the lessons learned from lessons 4 & 5. However the issue of being able to mark an element in an XML file as 'relatively unchanging' in a generic way seems to be lost on me. He then goes on to point out more of the problems with XML [and the Semantic Web/RDF]

There are some interesting implications in all of this.

One is that the Semantic Web is in for a lot of heartbreak. It has been trying for five years to convince the world to use it. It actually has a point. XML is supposed to be self-describing so that loosely coupled works. If you require a shared secret on both sides, then I’d argue the system isn’t loosely coupled, even if the only shared secret is a schema. What’s more, XML itself has three serious weaknesses in this regard:

  1. It doesn’t handle binary data well.
  2. It doesn’t handle links.
  3. XML documents tend to be monolithic.

Now it's gotten pretty interesting and at this point, Adam throws the curve ball.

Recently, an opportunity has arisen to transcend these limitations. RSS 2.0 has become an extremely popular format on the Web. RSS 2.0 and Atom (which is essentially isomorphic) both support a base schema that provides a model for sets. Atom’s general model is a container (a <feed>) of <entry> elements in which each <entry> may contain any namespace scoped elements it chooses (thus any XML), must contain a small number of required elements (<id>, <updated>, and <title>), and may contain some other well-known ones in the Atom namespace such as <link>s. Even better, Atom clearly says that the order doesn’t matter.This immediately gives a simple model for sets missing in XML.
Atom also supports links of other sorts, such as comments, so clearly an Atom entry can contain links to related feeds (e.g., Reviews for a Restaurant or Complaints for a Customer) or links to specific posts. This gives us the network and graph model that is missing in XML. Atom contains a simple HTTP-based way to INSERT, DELETE, and REPLACE
s within a . There is a killer app for all these documents because the browsers already can view RSS 2.0 and Atom and, hopefully, will soon natively support the Atom protocol as well, which would mean read and write capabilities.

Now that's deep. Why not move up one level of abstraction from exchanging XML documents to exchanging Web Feeds (RSS/Atom documents)? Adam ends his article by throwing a challenge out to database vendors who he believes have failed to learn the lessons of the Web by writing

All of this has profound implications for databases. Today databases violate essentially every lesson we have learned from the Web.

  1. Are simple relaxed text formats and protocols supported? No.
  2. Have databases enabled people to harness Moore’s law in parallel? This would mean that databases could scale more or less linearly to handle both the volume of the requests coming in and even the complexity. The answer is no.
  3. Do databases optimize caching when it is OK to be stale? No.
  4. Do databases let schemas evolve for a set of items using a bottom-up consensus/tipping point? Obviously not.
  5. Do databases handle flexible graphs (or trees) well? No, they do not.
  6. Have the databases learned from the Web and made their queries simple and flexible? No, just ask a database if it has anyone who, if they have an age, are older than 40; and if they have a city, live in New York; and if they have an income, earn more than $100,000. This is a nightmare because of all the tests for NULL.
The article ends by arguing that database vendors should add native support for the Atom Protocol and wire format. I find this interesting since based on conversations on the atom-protocol list, it is clear that Google is very interested in the Atom API. Perhaps they have already built this Atom store that Adam is arguing for and will expose the Atom API as a way to interact with it. Perhaps this Atom store accessible via Atom feeds and the Atom API is Google Base? Speculation is fun.

As for me, I tend to agree with Adam that moving up layers of abstraction is a good idea. We've all agreed on XML, the next thing to do is to agree on applications of XML. We've all agreed on RSS, the next thing to do is figure out what scenarios are enabled by the subscribe model. This is one of the reasons why I disliked the unnecessary fragmentation caused by the RSS vs. Atom battles. As for whether we need to start seeing databases with native RSS/Atom support, I think it's too early in the game to jump there. Heck, RDF has been around for a while but we are just know seeing some decent things happening with SPARQL and various RDF stores. Similarly with XML and XQuery. I don't think enough lessons have been learned from either to start thinking about what it would mean to have a native RSS/Atom store. It is an interesting idea though. 


Friday, 11 November 2005 01:35:48 (GMT Standard Time, UTC+00:00)
While you say that you agree with the guy on "moving up layers of abstraction", that's not what he's talking about at all.

He's trying to shove abstracted-out features down. (To judge by this article alone, there's no evidence that he understands the concept of abstraction at all.) He wants Flickr-style tagging in the database at the row level! He wants databases to send back query results in [feed] and [entry] elements! (Oh, and as an aside, I had angle brackets around those, but your blogging software called it "A potentially dangerous Request.Form value". I thought numeric character references were pretty much a solved problem.)

I rambled on a bit about this on my blog (see home page link on this comment if you're interested), but in nitpicking I didn't really address the most important fault with what he was saying about databases.

It should be obvious to anyone that a lot of the best web sites - sites which demonstrate what's great about how the web works - are based on databases. You can build applications that do enjoy the qualities he's praising, on top of databases to which they don't apply. Databases don't *need* to work that way to be useful, and some of what he suggests would be detrimental.

It's a bad article.
Monday, 14 November 2005 15:04:37 (GMT Standard Time, UTC+00:00)
I would have to agree with Fitzpatrick. Bosworth is seriously mixing technologies. You can't directly compare Atom with XML, and you can't expect the kind of features Atom provides in XML, because XML is a foundation language. It should not tailor itself to a specific domain, but instead be extensible for a vast array of uses, which it is.

Do databases need an XML-based query language? Eh... not my place to say (I haven't even looked at the XML support in SQL 2005 yet.... shame on me). But Atom? Definitely not. Doing so would be custom tailoring a query interface for a specific domain that has nothing to do with databases, purely because it's the going trend. Could you imagine the mess that we'd have today if SQL Server natively integrated every spec that came along and gained any level of popularity?

A database is a flexible, multi-purpose, data store, and XML is a flexible, multi-purpose, data format. And in that light, databases and XML mesh well... in the abstract. Neither should support the concrete realm of specific application domains, such as syndication. If we do need change, it needs to be a framework or spec built on top of XML and databases, and not into them natively. Google saw this and created Google Base. It's not a direct answer, but it's in the ballpark. Great, but now they want to try to blame the database vendors for not keeping up with their level of innovation? Now that's just dirty marketing....

I heard Adam's keynote at Dreamforce '05 (the freakishly cultish annual circus), and while I don't recall what he said specifically, I do recall ranting endlessly in disgust afterward, and since them Adam has been on my list of scary people with a detrimentally high level of influence.
Comments are closed.