Microformats vs. XML vs. RDF - Dare Obasanjo's weblog

August 8, 2005

@ 01:47 PM

In response to my post Using XML on the Web is Evil, Since When? Tantek updated his post Avoiding Plain XML and Presentational Markup. Since I'm the kind of person who can't avoid a good debate even when I'm on vacation I've decided to post a response to Tantek's response. Tantek wrote

The sad thing is that while namespaces theoretically addressed one of the problems I pointed out (calling different things by the same name), it actually WORSENED the other problem: calling the same thing by different names. XML Namespaces encouraged document/data silos, with little or no reuse, probably because every person/political body defining their elements wanted "control" over the definition of any particular thing in their documents. The <svg:a> tag is the perfect example of needless duplication.

And if something was theoretically supposed to have solved something but effectively hasn't 6-7 years later, then in our internet-time-frame, it has failed.

This is a valid problem in the real world. For example, for all intents an purposes an <atom:entry> element in an Atom feed is semantically equivalent to an <item> element in an RSS feed to every feed reader that supports both. However we have two names for what is effectively the same thing as far as an aggregator developer or end user is concerned.

The XML solution to this problem has been that it is OK to have myriad formats as long as we have technologies for performing syntactic translations between XML vocabularies such as XSLT. The RDF solution is for us to agree on the semantics of the data in the format (i.e. a canonical data model for that problem space) in which case alternative syntaxes are fine and we performs translations using RDF-based mapping technologies like DAML+OIL or OWL. The microformat solution which Tantek espouses is that we all agree on a canonical data model and a canonical syntax (typically some subset of [X]HTML).

So far the approach that has gotten the most traction in the real world is XML. From my perspective, the reason for this is obvious; it doesn't require that everyone has to agree on a single data model or a single format for that problem space.

Microformats don't solve the problem of different entities coming up with the different names for the same concept. Instead its proponents are ignoring the reasons why the problem exists in the first place and then offering microformats as a panacea when they are not.

I personally haven't seen a good explanation of why <strong> is better than <b>...

A statement like that begs some homework. The accessibility, media independence, alternative devices, and web design communities have all figured this out years ago. This is Semantic (X)HTML 101. Please read any modern web design book like those on my SXSW Required Reading List, and we'll continue the discussion afterwards.

I can see the reasons for a number of the semantic markup guidelines in the case of HTML. What I don't agree with is jumping to the conclusion that markup languages should never have presentational markup. This is basically arguing that every markup language that may be used as a presentation format should use CSS or invent a CSS equivalent. I think that is a stretch.

Finally, one has to seriously cast doubt on XML opinions on a page that is INVALID markup. I suppose following the XML-way, I should have simply stopped reading Dare's post as soon as I ran into the first well-formedness error. Only 1/2 ;)

The original permalink to Tantek's article was broken after he made teh edit. I guess since I couldn't find it, it doesn't exist. ;)

Categories: Web Development | XML

« Nigeria 2005 Trip: Week 2 | Home | MSN Spaces Powertoys Launched: HTML and ... »

Monday, 08 August 2005 19:10:32 (GMT Daylight Time, UTC+01:00)

To me, the most interesting concept in all this mess is using baseline semantics of (X)HTML that are common and well-understood (ok, maybe "somewhat understood"). Rather than starting from scratch every time, you already have predefined and widely adopted semantic concepts, such as hyperlinks, lists, etc.

Dimitri Glazkov

Tuesday, 09 August 2005 11:47:36 (GMT Daylight Time, UTC+01:00)

What I think is interesting about the microformat concept is that it uses class names instead of element names. This allows a document author to tag the same element (say a 'div') as being BOTH an 'atom-entry' and a 'rss-item'.

Because an item can have multiple 'classes', overlapping naming schemes can live together in the same document.

It's type-based vs class-based markup. An element can have only one type, but multiple classes. I'd say class-based markup is more powerful because of that.

Through classes, a document author can specify all the standardized ways the enclosed data may be interpreted. And if he wants to, he can use a non-standardized interpretation (including his own custom visual presentation) on top of that.

Microformats allow different standardized hierarchies to be merged together, as long as they're not conceptually conflicting with each other.

Put to the extreme, one could imagine an XML format with only one element type (say 'div' or 'span'), using only classes to convey meaning. Such a format could support both atom and rss data (or any other rival format) in one document. A user agent could then choose any one of the vocabularies it understands. It even could allow the user to switch between different 'interpretations'. Wouldn't that be plus?

Meryn

Wednesday, 10 August 2005 02:26:21 (GMT Daylight Time, UTC+01:00)

Great comments, Dare.

I especially liked your comparison of the "microformat", XML, and RDF approaches to the "two names for the same thing" problem.

That said, I think the "XML Solution" you describe isn't actually a solution to this problem. It seems to be more of a way of being OK with not solving the problem (which, arguably, can be a good thing).

Syntatic translation is only part of dealing with the problem, which ultimately comes down to having some common model between formats, or (in the microformat case) having a singular model-format.

The positive aspect of the XSLT approach seems to me to be that there is a "just in time" reconciliation of formats, e.g., the common model is only what you need it to be to reconcile the formats.

The negative aspect of the XSLT approach seems to me to be that it difuses the need to focus on the model, i.e., to reconcile the discrepancies between the models rather than between formats.

Jay Fienberg

Comments are closed.

Dare Obasanjo's weblog

"You can buy cars but you can't buy respect in the hood" - Curtis Jackson

Navigation for Microformats vs. XML vs. RDF - Dare Obasanjo's weblog