November 11, 2003
@ 11:10 PM

I noticed the followingRDF Interest Group IRC chat log discussing my recent post More on RDF, The Semantic Web and Perpetual Motion Machines in my referrer logs. I found the following excerpts quite illuminating

15:43:42 <f8dy> is owl rich enough to be able to say that my <pubDate>Tue, Nov 11, 2003</pubDate> is the same as your <dc:date>2003-11-11</dc:date>

15:44:35 <swh> shellac: I believe that XML datatypes are...

...

16:08:15 <f8dy> that vocabulary also uses dates, but it stores them in rfc822 format

16:08:51 <f8dy> 1. how do i programmatically determine this?

16:08:58 <JHendler> ah, but you cannot merge graphs on things without the same URI, unless you have some other way to do it

16:09:02 <f8dy> 2. how do i programmatically convert them to a format i understand?

...

16:09:40 <shellac> 1. use

...

16:10:13 <shellac> 1. use a xsd library

16:10:32 <shellac> 2. use an xsd library

...

16:11:08 <JHendler> n. use an xsd library :->

16:11:30 <shellac> the graph merge won't magically merge that information, true

16:11:34 <JHendler> F: one of my old advisors used to say the only thing better than a strong advocate is a weak critic

This argument cements my suspicions that the using RDF and Semantic Web technologies are a losing proposition when compared to using XML-centric technologies for information interchange on the World Wide Web. It is quite telling that none of the participants who tried to counter my arguments gave a cogent response besides "use an xsd library" when in fact anyone with a passing knowledge of XSD would inform them that XSD only supports ISO 8601 dates and would barf on RFC 822 if asked to treat them as dates. In fact, this is a common complaint about them from our customers w.r.t internationalization [that and the fact decimals use a period as a delimiter instead of a comma for fractional digits]. 

Even in this simple case of mapping equivalent elements (dc:date and pubDate) the Semantic Web advocates cannot provide a solution to how their vaunted ontolgies can provide a solution to a problem the average RSS aggregator author solves in about 5 minutes of coding using off-the-shelf XML tools. It is easy to say philosphically that dc:date and pubDate after all, they are both dates, but another thing to write code that knows how to treat them uniformly. I am quite surprised that such a straightforward real-world example cannot be handled by Semantic Web technologies. Clay Shirky's The Semantic Web, Syllogism, and Worldview makes even more sense now.

One of my co-workers recently called RDF an academic plaything, after seeing how many of its advocates ignore the difficult real world problems faced by software developers and computer users today while pretending that obtuse solution to trivial problems are important, I've definitely lost any interest I had left in investigating any further about the Semantic Web.


 

Categories: XML

November 11, 2003
@ 03:40 PM

From the Memory Hole

The Memory Hole posted an extract from an essay by George Bush Sr. and Brent Scowcroft, in which they explain why they didn't have the military push into Iraq and topple Saddam during Gulf War 1. Although there are differences between the Iraq situations in 1991 and 2002-3, Bush's key points apply to both.

But a funny thing happened. Fairly recently, Time pulled the essay off of their site. It used to be at this link, which now gives a 404 error. If you go to the table of contents for the issue in which the essay appeared (2 March 1998), "Why We Didn't Remove Saddam" is conspicuously absent.

Ever since September 11, 2001 the news continues to sound more and more like excerpts from George Orwell's 1984. All is not lost though, it has been heartening to see that some teachers are using this incident as a way to teach their students about media literacy. My favorite is Rewriting History: The Dangers of Digitized Research by Peg Hesketh 


 

Categories: Ramblings

I always love the Top 50 IRC Quotes. Warning, some of them are a bit risqué.


 

Categories: Ramblings

My post from yesterday garnered a couple of responses from the RDF crowd who questioned the viability of the approaches I described. Below I take a look at some of their arguments and relate them to practical examples of exchanging information using XML I have encountered in my regular development cycle.  

Shelley Powers writes

One last thing: I wanted to also comment on Dare Obasanjo's post on this issue. Dare is saying that we don't need RDF because we can use transforms between different data models; that way everyone can use their own XML vocabulary. This sounds good in principle, but from previous experience I've had with this type of effort in the past, this is not as trivial as it sounds. By not using an agreed on model, not only do you now have to sit down and work out an agreement as to differences in data, you also have to work out the differences in the data model, too. In other words -- you either pay upfront, once; or you keep paying in the end, again and again. Now, what was that about a Perpetual Motion Machine, Dare?

In responding to Shelley's post it is easier for me if I use a concrete example. RSS Bandit uses a custom format that I came up with for describing a user's list of subscribed feeds. However in the wild, other news aggregators us differing formats such as OPML and OCS. To ensure that users who've used other aggregators can try out RSS Bandit without having to manually enter all their feeds I support importing feed subscription lists in both the OPML and OCS format even though this is distinct from the format and data model I use internally. This importation is done by applying an XSLT to the input OPML or  OCS file to convert it to my internal format then converting that XML into the RSS Bandit object model. The stylesheets took me about 15 to 30 minutes to write for each one. This is the XML-based solution.

Folks like Shelley believe my problem could be better solved by RDF and other Semantic Web technologies. For example, if my internal format was RDF/XML and I was trying to import an RDF-based format such as OCS then instead of using a language like XSLT that performs a syntactic transform of one XML format to the other I'd use an ontology language such as OWL to map between the data models of my internal format and OCS. This is the RDF-based solution.

Right of the bat, it is clear that both approaches share certain drawbacks. In both cases, I have to come up with a transformation from one represention of a feed list to another. Ideally, for popular formats there would be standard transformations described by others to move from one popular format to another (e.g. I don't have to write a transformation for WordML to HTML but do for WordML to my custom document format)  so developers who stick to popular formats simply have to locate the transformation as opposed to actually authoring it themselves. 

However there are further drawbacks to using the semantics based approach than using the XML-based syntactic approach. In certain cases, where the mapping isn't merely a case of showing equivalencies between the semantics of similarly structured elemebts  (e.g. the equivalent of element renaming such as stating that a url and link element are equivalent) an ontology language is insufficient and a Turing complete transformation language like XSLT is not.  A good example of this is another example from RSS Bandit. In various RSS 2.0 feeds there are two popular ways to specify the date an item was posted, the first is by using the pubDate element which is described as containing a string in the RFC 822 format while the other is using the dc:date element  which is described as containing a string in the ISO 8601 format. Thus even though both elements are semantically equivalent, syntactically they are not. This means that there still needs to be a syntactic transformation applied after the semantic transformation has been applied if one wants an application to treat pubDate and dc:date as equivalent. This means that instead of making one pass with an XSLT stylesheet to perform the transformation in the XML-based solution, two  transformation techniques will be needed in the RDF-based solution and it is quite likely that one of them would be XSLT.

The other practical concern is that I already know XSLT and have good books to choose from to learn about it such as Michael Kay's XSLT : Programmer's Reference and Jeni Tennison's XSLT and XPath On The Edge as well as mailing lists such as xsl-list where experts can help answer tough questions.

From where I sit picking an XML-based solution over an RDF-based one when it comes to dealing with issues involving interchange of XML documents just makes a lot more sense. I hope this post helps clarify my original points.

Ken MacLeod also wrote

In his article, Dare suggests that XSLT can be used to transform to a canonical format, but doesn't suggest what that format should be or that anyone is working on a common, public repository of those transforms.

The transformation is to whatever target format the consumer is comfortable with dealing with. In RSS Bandit the transformations are OCS/OPML to my internal feed list format and RSS 1.0 to RSS 2.0. There is no canonical transformation to one Über XML format that will solve every one's problems.  As for keeping a common, public repository of such transformations that is an interesting idea which I haven't seen anyone propose in the past. A publicly accessible database of XSLT stylesheets  for transforming between RSS 1.0 and RSS 2.0, WordML to HTML, etc. would be a useful addition to the XML community.

Sam Ruby muddies the waters in his post  Blind Spots and subsequent comments in that thread by confusing the use cases around XML as a data interchange format and XML as a storage data format. My comments above have been about XML as a data interchange format, I'll probably post more in future about RDF vs. XML as a data storage format using the thread in Sam's blog for context.


 

Categories: XML

Ken MacLeod writes

Clay Shirky criticizes the Semantic Web in his article, The Semantic Web, Syllogism, and Worldview, to which Sam Ruby accurately assesses, "Two parts brilliance, one part strawman."

Joe Gregorio responds to Shirky's piece with this very concrete statement:

This is exactly the point I made in The Well-Formed Web, that the value that the proponents of the Semantic Web were offering could be achieved just as well with just XML and HTTP, and we are doing it today with no use of RDF, no need to wait for ubiquitous RDF deployment, no need to wait for RDF parsing and querying tools.

Yet, in the "just XML" world there is no one that I know of working on a "layer" that lets applications access a variety of XML formats (schemas) and treat similar or even logically equivalent elements or structures as if they were the same. This means each XML application developer has to do all of the work of integrating each XML format (schema): N × M.

The difference between the RDF proponents and the XML proponents is fairly simple. In the XML-centric world parties can utilize whatever internal formats and data sources they want but exchange XML documents that conform to an agreed upon format, in cases where the agreed upon format conflicts with internal formats then technologies like XSLT come to the rescue. The RDF position is that it is too difficult to agree on interchange formats so instead of going down this route we should use A.I.-like technologies to map between formats. Note, that this doesn't mean transformations don't need to be done as Ken points out

The RDF model along with the logic and equivalency languages, like OWL (nee DAML+OIL),

Thus, if you are an XML practitioner RDF doesn't change much except new transformation techniques and technologies to learn.

Additionally as Clay Shirky points out, on investigation it isn't even clear whether the basic premises of  RDF and similar Semantic Web technologies is based on a firm foundation and sound logic. In conclusion Ken wrote,

One can take potshots at RDF for how it addresses the problem, and the Semantic Web for possibly reaching too far too quickly in making logical assertions based on relations modeled in RDF, but to dismiss it out of hand or resort to strawmen to attack it all while not recognizing the problem it addresses or offering an alternative solution simply tells me they don't see the problem, and therefore have no credibility in knocking RDF or the Semantic Web for trying to solve it.

I wonder if I'm the only one that sees the parallels between the above quote and statements that attributed to religious fundamentalists. I wonder if Ken is familiar with Perpetual Motion Machines? The problem they want to solve is real albeit impossible to solve. Does he also feel that no one has the credibility to knock any one of the numerous designs for one that have been proposed until the critic can themselves produce a perpetual motion machine?


 

Categories: XML

November 10, 2003
@ 03:42 AM

Say hello to Mike Deem


 

Categories: Life in the B0rg Cube

November 10, 2003
@ 03:40 AM

I saw the Regina Carter Quintet at Dimitrou's Jazz Alley last night and I must say I never thought a jazz violinist would sound so good. If you live in the Seattle area and have never checked out the Jazz Alley you need to give it a try. It's definitely the place to go for a nice night of good music and good food with a loved one.

 


 

Categories: Ramblings

Miguel De Icaza recently wrote

To make Linux a viable platform for mainstream desktop use our community needs to realize the importance of these third-party vendors and not alienate them. Having a stable API, and a stable ABI is very important for this reason. GNOME has learned this lesson and has strict commitments on ABI/API stability (thanks to our friends at Sun that pushed for this) and the XFree folks deserve the credit for making ABI compatibility across operating systems a reality for their drivers. Two good steps in the right direction.

This highlights one of the primary differences between amateur and professional software development. To professional software developers words like "backwards compatibility" and "no breaking changes" are words to live by regardless of how painful this can be at times, to amateur software developers they are the equivalent of four letter words that should be ignored in the quest to building something "faster, better and cheaper" than the rest.

 

An example of this just hit me in RSS Bandit development. I recently released RSS Bandit v1.2.0.42 which added

The ability to store and retrieve feed list from remote locations such as a dasBlog blog, an FTP server or a network file share. This enables users utilizing RSS aggregators on multiple machines to synchronize their feed list from a single point. This feature has been called a subscription harmonizer by some.

Around the time I shipped this release a new version of dasBlog (which is managed by Clemens Vasters) shipped and removed the XML Web Service end points that RSS Bandit was dependent upon for this feature. To see the difference you can compare the ConfigEditingService.asmx from my weblog to the ConfigEditingService.asmx on Torsten's weblog. Note that they are almost completely different. In the space of one release, dasBlog broke a feature in RSS Bandit and any other client that was written to target those end points.

Behavior like this is the bane of the Open Source world that Miguel decried. It's always so easy to change the source code and tell people to download the latest bits that few ever think of the churn they cause by indiscrimnately breaking applications, after all users can always download the newest bits or tweak the code themselves in the worst case.

Instead of finishing up the perf improvements I had in mind for RSS Bandit and working on perf I know have to figure out how to support both dasBlog APIs in a seemless manner without placing the burden on end users. You'll note that both my blog and Torsten's state that they are running dasBlog v1.3.3266.0 even though they expose different XML web service end points which means I can't even expect the user to be able to tell what version of dasBlog they are running with any accuracy.

What a waste of a Sunday evening...


 

Categories: RSS Bandit

Joe Gregorio has a new blog post entitled Longhorn versus the light of day where he makes some valid points about three of the major technologies mentioned at the Microsoft's Professional Developer Conference (namely Indigo, Avalon and WinFS) such as the fact that they are anything from two to three from shipping then it'll be a number of years after that until they're usage is widespread enough to make a significant difference. However he claims that the technologies aren't even worthwhile specifically he states

Three major components, all useless in their own right, now stacked together on a shaky foundation to make an even more useless heap.

Now this statement doesn't seem to jibe with any of the opinions I've seen of people who actually develop for Windows and have heard of the three aforementioned technologies. Below are my one sentence summaries of all three

  • Avalon: A set of new UI widgets for the operating system and a managed programming model for accessing them.
  • Indigo: The next generation of DCOM except this time it is built on open standards and will interop with any platform with an XML parser as opposed to the old days when it was mainly a Windows<->Windows thing.
  • WinFS: A consistent way to add and query metadata to files.

As a hobbyist Windows developer who's built a Windows rich client application using C# (i.e. RSS Bandit) all three of the above are very useful advances.

  • Avalon: Now I can actually build a rich UI in C# without (a) having to use P/Invoke or COM interop to talk to native Windows widgets or (b) using a third party library that uses P/Invoke or COM interop to talk to native Windows widgets. Currently for RSS Bandit I use both and the free third party UI library I was using (the Magic Library) just went commercial ($200 a pop). I don't see why after paying money for Windows and Visual Studio I still have to cough up dollars or jump through hoops before I can build a UI with dockable Windows. Anything Microsoft does to remedy this is a good thing as far as I'm concerned.

  • Indigo: RSS Bandit already uses Web Services to talk to servers, for instance I use for subscription harminization where I can post my feedlist to my dasBlog weblog I talk to the ConfigEditingService it exposes. However there is a lot more that I'd like RSS Bandit to be able to do when communicating with weblogs such as query what services they support (i.e. WS-Policy) and then when communications occurs doing so in a flexible yet secure manner (i.e. WS-Security, WS-SecureConversation and WS-Authorization). Indigo will give me all this stuff as operating system facilities and they will interop with any platform where an implementation of these specs exists.

  • WinFS: The most frequently requested feature for RSS Bandit is for the ability for users to search the feeds they've downloaded on disk. Currently I have two choices here, store all the XML in objects then loop over them to perform searches or store them on disk as XML then fire up an XPath engine which loops each directory and queries each file. The first is fairly memory intensive while the second is slow. I suspect SharpReader does the former second which explains why people regularly complain about the fact that it sometimes uses over 100MB of memory. With WinFS I'll be able to do the latter and stuff like performant query will be handled by the OS and the metadata to search against will be added by RSS Bandit.

Even as a hobbyist developer working on Windows technologies, all three of the technolgies Joe Gregorio calls useless will be of use to me. I can only assume that Mr. Gregorio is not a Windows developer which would explain his point of view.


 

Categories: Life in the B0rg Cube

November 7, 2003
@ 03:23 PM

I've posted previously on why I think the recent outcry for the W3C to standardize on a binary format for the representation of XML information sets (aka "binary XML") is a bad idea which could cause significant damage to interoperability on the World Wide Web. Specifically I wrote

Binary XML Standard(s): Just Say No

Omri and Joshua have already posted the two main reasons why attempting to create a binary XML standard is folly (a) the various use cases and requirements are contradictory (small message size for low bandwidth situations and minimal parsing/serialization time for sitautions where minimizing processing time is prime) thus a single standard is unlikely to satisfy a large proportion of the requesters and (b) creation of a binary XML standard especially by an organization such as the W3C muddies the water with regards to interop, people already have to worry about the interop pain that will occur whenever XML 1.1 gets out of the door (which is why Elliotte Rusty Harold advises avoiding it like the plague) let alone adding one or more binary XML standards to the mix.

I just read the report from the W3C Workshop on Binary Interchange of XML Information Item Sets and I'm glad to see the W3C did not [completely] bow to pressure from certain parties to start work on a "binary XML" format. The following is the conclusion from the workshop 

CONCLUSIONS

The Workshop concluded that the W3C should do further work in this area, but that the work should be of an investigative nature, gathering requirements and use cases, and prepare a cost/benefit analysis; only after such work could there be any consideration of whether it would be productive for W3C to attempt to define a format or method for non-textual interchange of XML.

See also Next Steps below for the conclusions as they were stated at the end of the Workshop.

This is new ground for the W3C. Usually W3C working groups are formed to take competing requirements from umpteen vendors and hash out a spec. Of course, the problems with this approach is that it doesn't scale. It may have worked for HTML when the competing requirements primarily came from two vendors but now that XML is so popular it doesn't work quite as well, as Tim Bray put it "any time there's a new initiative around XML, there are instantly 75 vendors who want to go on the working group".

It's good to see the W3C decide to take an exploratory approach instead of just forging ahead to create a spec that tries to satisfy myriad competing and contradictory requirements. They've done this before with W3C XML Schema (and to a lesser extent with XQuery) and the software industry is still having difficulty digesting the results. Hopefully at the end of their investigation they'll come to the right conclusions.


 

Categories: XML