Semantic Markup and HTML

One of the biggest gripes about HTML as the primary authoring format for Web documents has been that it focus too much on presentation and not enough on semantics and content. This lack of semantics in HTML markup makes it hard to mechanically process web documents. The common example of applications that get the short end of the stick is search engines. Using just HTML there is no way to differentiate via markup whether the words "Dare Obasanjo" refer to a name or an action.

Over time a number of seemingly arbitrary tags not strictly related to presentation eventually made it into the HTML tag set such as code, cite, and acronym. However the bigwigs in the W3C realized that trying to define all the possible semantic tags people would want to use on the Web in HTML was the wrong approach and tried a different approach. The first was to create a markup language for the web that allowed users to create their own semantic tags, this markup language is XML.

The second part, which is still an area of ongoing research which most call the Semantic Web, is how to relate all these different pieces of semantic markup. It takes little imagination to realize that if markup aware search engines that know how to "Find all documents authored by Dare Obasanjo" by searching for <author>Dare Obasanjo</author> in documents, they will also need to be able to tell that documents containing <creator>Dare Obasanjo</creator> are also relevant. This is rather difficult and has resulted in a number of complicated Web ontology related technologies such as RDF, DAML+OIL, OWL, and more.

Going back to Mark Pilgrim's post where he describes processing the <cite> tags on his site he states
Let's try pushing the envelope of what HTML is actually designed to do, before we get all hot and bothered trying to replace it, mmmkay?
which seems to run counter to what his example shows and in fact is akin to saying "Look ma, if I put function pointers in a C struct I don't need an object oriented programming language". Mark's example hints at the kind of truly interesting things people could do with Web documents if they were actually marked up semantically and not just a mass of <b>, <font> and <br> tags. I for one would have preferred handling semantic markup when I wrote code to convert my K5 diary page to an RSS feed or when I wrote the K5 story parser for the K5 user search engine. Mark's parting words completely contradict the feature he has added to his website and in fact is an example of why HTML bears replacing.

Now for some clarification. The above comments do not make me a semantic web advocate nor do they indicate that my previous thoughts about XHTML are changed. I personally think the semantic web is a pipe dream in much the same way "we will have real AI in the next 20 years" was a pipe dream a few decades ago. However in the same way those AI researchers ended up giving us Lisp which brought us Emacs (M-x all-hail-emacs) so also it is likely the semantic web folks may inadvertanly produce really cool technology without meaning to. Also Google has been doing the Semantic Web thing without needing people to alter their existing documents, understand complex specifications or make semantic web related decisions when authoring documents.


Mac Addicts

Doug admits to being a Mac addict. His post reminded me of the mac addiction article on Wired. I was amused by the fact the article states
What makes Mac users so loyal?
The answer, of course, depends on who is asked:
But some common themes emerge: community, the alternative to Microsoft, and the brand, which connotes nonconformity, liberty and creativity.
which makes me wonder about Doug. :)

He isn't the only one who I've seen bitten by the lifestyle ad when it comes to geek passions. Most of the kids running Linux when I was at school did so because it was the "geek thing to do" and not for any reasons they could argue coherently. I also suspect this is the root of why I started using Emacs although that has long been superseded by all the cool shit I can actually do with it.


Article Translations

The Web can surprise the heck out of you sometimes. I still can't get over there is French translation of my C# vs. Java article and a Chines translation of my interview with Miguel. It is weirdly humbling to see my words translated and spread to an entire audience I had not anticipated or expected to serve when I originally wrote the articles. Truly, a World Wide Web.


Get yourself a News Aggregator and subscribe to my RSSfeed

Disclaimer: The opinions in this diary are my own and do not reflect the opinions, thoughts, intentions or strategies of my employer.


Comments are closed.