June 6, 2004
@ 04:18 AM

One of my friends, Joshua Allen, is a fan of RDF and Semantic Web technologies. Given that I respect his opinion a lot I keep trying to delve into RDF and its family of technologies every couple of months to see what it provides to the world of data access and information interchange above and beyond existing technologies. Recently I discovered that there are some in the RDF camp that position it as a "better XML". The first example of this I saw was an old article by Tim Berners-Lee entitled Why RDF model is different from the XML model. According to Tim the note is an attempt to answer the question, "Why should I use RDF - why not just XML?". However instead of answering the question his note just left me with more questions than answers. The pivotal point for me in Tim Berners-Lee's note is the following excerpt

Things you can do with RDF which you can't do with XML include

  • You can parse the semantic tree, which end up giving you a set of (possibly mutually referential) triples and then you can use the ones you want ignoring the ones you don't understand.

Problems with basing you understanding on the structure include

  • Without having gone to the trouble of getting the schema, or having an application hand-programmed to recognise a particular document type, you can't pick up any semantic information from a document;
  • When an XML schema changes, it could typically introduce new intermediate elements (like "details" in the tree above or "div" is HTML). These may or may or may not invalidate any query which has been based on the structure of the document.
  • If you haven't gone to the trouble of making a semantic model, then you may not have a well defined one.

It seems that the point being argued is that with RDF you can get more understanding of the information in the document than with just XML. Being that one could consider RDF as just a logical model layered on top of an XML document (e.g. RDF/XML) I find it hard to understand how viewing some XML document through RDF colored glasses buys one so much more understanding of the data.

Recently I discovered a presentation entitled REST, Self-description, and XML by Mark Baker. This presentation discusses the ideas in Tim Berners-Lee's note in more depth and in a way I finally understand. The first key idea in Mark's presentation is the notion of "self describing" data formats which were also covered in Tim Berners-Lee's presentation at WWW2002 entitled Specs Count. The core tennets of "self describing" data formats are covered in slide 10 and slide 11 of Mark's presentation. A "self describing" data formats contains all the data needed to figure out how to process the format from publically accessible specs. For example, an HTTP response tells you the MIME type of the document which can be used to locate the appropriate RFC which governs how the format should be processed. In the case of XML, Tim Berners-Lee states that an HTTP response which returns an XML document either as application\xml or text\xml should be processed according to the rules of the XML and XML namespaces recommendations which state that the identity of an element is determined based on its namespace name. So when processing an XML document, Tim asserts that it is self describing because one can locate the spec for the format from the namespace URI of the root element. Of course, Mark disagrees with this but his reasons for doing so is pedantic spec lawyering. I disagree with it as well but for different reasons. The main reason I disagree with it is because it puts a stake in the ground and says that any XML format on the Web that doesn't use namespace name for its root element or whose namespace name is not a dereferenceable URI that leads to a spec is broken. This automatically states that XML formats used on the Web today such as RSS 1.0, RSS 2.0, OPML and the Atom 0.3 syndication format are broken.

Mark then goes on to state in slide 20 that a problem with XML formats is that one can't arbitrarily extend an XML document without it's schema or without breaking some application somewhere. It's unclear as to what he means by the document's schema but will grant that it is likely that arbitrary additions to the expected content of an XML document will break certain applications. Getting to slide 24, it is slightly clearer what Mark is getting at. He claims that one although one can add extend a format by adding extra elements from a known namespace using just XML technologies this doesn't tell you how to deal with the extensions. On the other hand, with RDF the extensions are all concepts named with a URI whose meaning can then be looked up using HTTP GET. This is where he lost me. I don't see the difference between seeing a namespaced XML element in an XML format and using HTTP GET on the namespace URI of the element to locate the spec or schema for the namespaced extension and what he describes as the gains of using RDF.

The more I look at how RDF people bag on XML the more it seems that they don't really write applications in today's world. Almost every situation I've seen someone claim that RDF technologies will in the future be able to solve a problem XML cannot, the problem is actually not only solveable with XML technologies but actually is being solved using XML technologies today.