It looks like its confirmed that I'll be attending XML 2003.

Should be fun.


 

Categories: Ramblings

Oleg Tkachenko writes

Just found new beast in the Longhorn SDK documentation - OPath language:

The OPath language is the query language used to query for objects using an ObjectSpace. The syntax of OPath also allows you to query for objects using standard object oriented syntax. OPath enables you to traverse object relationships in a query as you would with standard object oriented application code and includes several operators for complex value comparisons.

Orders[Freight > 5].Details.Quantity > 50 OPath expression should remind you something familiar. Object-oriented XPath cross-breeded with SQL? Hmm, xml-dev flamers would love it.

The approach seems to be exactly opposite to ObjectXPathNavigator's one - instead of representing object graphs in XPathNavigable form, brand new query language is invented to fit the data model. Actually that makes some sense, XPath as XML-oriented query language can't fit all. I wonder what Dare think about it. More studying is needed, but as for me (note I'm not DBMS-oriented guy though) it's too crude yet

Oleg is right that an XML oriented query language like XPath doesn't fit for querying objects. There is a definitely an impedance mismatch between XML and objects, a good number of which were pointed out by Erik Meijer in his paper Programming with Circles, Triangles and Rectangles. A significant number of constructs and semantics of XPath simply don't make sense in a language designed to query objects. The primary construct in XPath is the location step which consists of an axis, a node test and zero or more predicates, of which both the axis and the node test are out of place in an object query language.

From the XPath Grammar, there are 13 axes of which almost none make sense for objects besides self. They are listed below

[6]    AxisName    ::=    'ancestor'
| 'ancestor-or-self'
| 'attribute'
| 'child'
| 'descendant'
| 'descendant-or-self'
| 'following'
| 'following-sibling'
| 'namespace'
| 'parent'
| 'preceding'
| 'preceding-sibling'
| 'self'

The ones related to document order such as preceding, following, preceding-sibling and following-siblings don't really apply to objects since there is no concept of order amongst the properties and fields of a class. The attribute axis is similarly unrelated since there is no equivalent of the distinction between elements and attributes among the fields and properties of a class. 

The axes related to document hierarchy such as parent, child, ancestor, descendent, etc look like they may make sense to map to object oriented concepts until one asks what exactly is meant to be the parent of an object? Is it the base class or the object to which the current object belongs as a field or property? Most would respond that it is the latter. However what happens when multiple objects have the same object as a field which is often the case since objects structures are graph-like not tree-like as XML structures? It also gets tricky when an object that is a field in one class is a member of a collection in another class. Is the object a child of the collection? If so what is the parent of the object, if not what is the relationship of the object to the collection then? The questions can go on...

On the surface the namespace axes sounds like it could map to concepts from object oriented programming since languages like C#, C++ and Java all have a concept of a "namespace". However namespace nodes in the XPath data model have distinct characteristics (such as the fact that each element node in document has a distinct set of namespace nodes regardless of whether each of these namespace nodes represent the same mapping of a prefix to a namespace URI). 

A similar argument can also be made around node tests which are the second primary constructs in XPath location steps. A node test either specifies a name or a type of node to match. A number of XPath node types don't have equivelants in the object oriented world such as comment and processing instruction nodes. Other nodes such as text and element nodes are problematic when one begins to try to tie them in to the various axes such as the parent axis.

Basically, a significant amount of XPath is not really applicable to querying objects without changing the semantics of certain aspects of the language in a way that conflicts with how XPath is used when querying XML documents.

As for how this compares to my advocacy of XML to object mapping techniques such as the ObjectXPathNavigator, the answer is simple; XML is the universal data interchange format and the software world is moving to a situation where all the major sources of important data can be accessed or viewed as XML from office documents to network messages to information locked within databases. It makes sense then that in creating this universal data access layer that one create a way for all interesting sources of data to be viewed as XML so they to can participate as input for data aggregation technologies such as XSLT or XQuery and enable the reuse of XML technologies for processing and manipulating them.


 

Categories: Life in the B0rg Cube | XML

November 11, 2003
@ 11:10 PM

I noticed the followingRDF Interest Group IRC chat log discussing my recent post More on RDF, The Semantic Web and Perpetual Motion Machines in my referrer logs. I found the following excerpts quite illuminating

15:43:42 <f8dy> is owl rich enough to be able to say that my <pubDate>Tue, Nov 11, 2003</pubDate> is the same as your <dc:date>2003-11-11</dc:date>

15:44:35 <swh> shellac: I believe that XML datatypes are...

...

16:08:15 <f8dy> that vocabulary also uses dates, but it stores them in rfc822 format

16:08:51 <f8dy> 1. how do i programmatically determine this?

16:08:58 <JHendler> ah, but you cannot merge graphs on things without the same URI, unless you have some other way to do it

16:09:02 <f8dy> 2. how do i programmatically convert them to a format i understand?

...

16:09:40 <shellac> 1. use

...

16:10:13 <shellac> 1. use a xsd library

16:10:32 <shellac> 2. use an xsd library

...

16:11:08 <JHendler> n. use an xsd library :->

16:11:30 <shellac> the graph merge won't magically merge that information, true

16:11:34 <JHendler> F: one of my old advisors used to say the only thing better than a strong advocate is a weak critic

This argument cements my suspicions that the using RDF and Semantic Web technologies are a losing proposition when compared to using XML-centric technologies for information interchange on the World Wide Web. It is quite telling that none of the participants who tried to counter my arguments gave a cogent response besides "use an xsd library" when in fact anyone with a passing knowledge of XSD would inform them that XSD only supports ISO 8601 dates and would barf on RFC 822 if asked to treat them as dates. In fact, this is a common complaint about them from our customers w.r.t internationalization [that and the fact decimals use a period as a delimiter instead of a comma for fractional digits]. 

Even in this simple case of mapping equivalent elements (dc:date and pubDate) the Semantic Web advocates cannot provide a solution to how their vaunted ontolgies can provide a solution to a problem the average RSS aggregator author solves in about 5 minutes of coding using off-the-shelf XML tools. It is easy to say philosphically that dc:date and pubDate after all, they are both dates, but another thing to write code that knows how to treat them uniformly. I am quite surprised that such a straightforward real-world example cannot be handled by Semantic Web technologies. Clay Shirky's The Semantic Web, Syllogism, and Worldview makes even more sense now.

One of my co-workers recently called RDF an academic plaything, after seeing how many of its advocates ignore the difficult real world problems faced by software developers and computer users today while pretending that obtuse solution to trivial problems are important, I've definitely lost any interest I had left in investigating any further about the Semantic Web.


 

Categories: XML

November 11, 2003
@ 03:40 PM

From the Memory Hole

The Memory Hole posted an extract from an essay by George Bush Sr. and Brent Scowcroft, in which they explain why they didn't have the military push into Iraq and topple Saddam during Gulf War 1. Although there are differences between the Iraq situations in 1991 and 2002-3, Bush's key points apply to both.

But a funny thing happened. Fairly recently, Time pulled the essay off of their site. It used to be at this link, which now gives a 404 error. If you go to the table of contents for the issue in which the essay appeared (2 March 1998), "Why We Didn't Remove Saddam" is conspicuously absent.

Ever since September 11, 2001 the news continues to sound more and more like excerpts from George Orwell's 1984. All is not lost though, it has been heartening to see that some teachers are using this incident as a way to teach their students about media literacy. My favorite is Rewriting History: The Dangers of Digitized Research by Peg Hesketh 


 

Categories: Ramblings

I always love the Top 50 IRC Quotes. Warning, some of them are a bit risqué.


 

Categories: Ramblings

My post from yesterday garnered a couple of responses from the RDF crowd who questioned the viability of the approaches I described. Below I take a look at some of their arguments and relate them to practical examples of exchanging information using XML I have encountered in my regular development cycle.  

Shelley Powers writes

One last thing: I wanted to also comment on Dare Obasanjo's post on this issue. Dare is saying that we don't need RDF because we can use transforms between different data models; that way everyone can use their own XML vocabulary. This sounds good in principle, but from previous experience I've had with this type of effort in the past, this is not as trivial as it sounds. By not using an agreed on model, not only do you now have to sit down and work out an agreement as to differences in data, you also have to work out the differences in the data model, too. In other words -- you either pay upfront, once; or you keep paying in the end, again and again. Now, what was that about a Perpetual Motion Machine, Dare?

In responding to Shelley's post it is easier for me if I use a concrete example. RSS Bandit uses a custom format that I came up with for describing a user's list of subscribed feeds. However in the wild, other news aggregators us differing formats such as OPML and OCS. To ensure that users who've used other aggregators can try out RSS Bandit without having to manually enter all their feeds I support importing feed subscription lists in both the OPML and OCS format even though this is distinct from the format and data model I use internally. This importation is done by applying an XSLT to the input OPML or  OCS file to convert it to my internal format then converting that XML into the RSS Bandit object model. The stylesheets took me about 15 to 30 minutes to write for each one. This is the XML-based solution.

Folks like Shelley believe my problem could be better solved by RDF and other Semantic Web technologies. For example, if my internal format was RDF/XML and I was trying to import an RDF-based format such as OCS then instead of using a language like XSLT that performs a syntactic transform of one XML format to the other I'd use an ontology language such as OWL to map between the data models of my internal format and OCS. This is the RDF-based solution.

Right of the bat, it is clear that both approaches share certain drawbacks. In both cases, I have to come up with a transformation from one represention of a feed list to another. Ideally, for popular formats there would be standard transformations described by others to move from one popular format to another (e.g. I don't have to write a transformation for WordML to HTML but do for WordML to my custom document format)  so developers who stick to popular formats simply have to locate the transformation as opposed to actually authoring it themselves. 

However there are further drawbacks to using the semantics based approach than using the XML-based syntactic approach. In certain cases, where the mapping isn't merely a case of showing equivalencies between the semantics of similarly structured elemebts  (e.g. the equivalent of element renaming such as stating that a url and link element are equivalent) an ontology language is insufficient and a Turing complete transformation language like XSLT is not.  A good example of this is another example from RSS Bandit. In various RSS 2.0 feeds there are two popular ways to specify the date an item was posted, the first is by using the pubDate element which is described as containing a string in the RFC 822 format while the other is using the dc:date element  which is described as containing a string in the ISO 8601 format. Thus even though both elements are semantically equivalent, syntactically they are not. This means that there still needs to be a syntactic transformation applied after the semantic transformation has been applied if one wants an application to treat pubDate and dc:date as equivalent. This means that instead of making one pass with an XSLT stylesheet to perform the transformation in the XML-based solution, two  transformation techniques will be needed in the RDF-based solution and it is quite likely that one of them would be XSLT.

The other practical concern is that I already know XSLT and have good books to choose from to learn about it such as Michael Kay's XSLT : Programmer's Reference and Jeni Tennison's XSLT and XPath On The Edge as well as mailing lists such as xsl-list where experts can help answer tough questions.

From where I sit picking an XML-based solution over an RDF-based one when it comes to dealing with issues involving interchange of XML documents just makes a lot more sense. I hope this post helps clarify my original points.

Ken MacLeod also wrote

In his article, Dare suggests that XSLT can be used to transform to a canonical format, but doesn't suggest what that format should be or that anyone is working on a common, public repository of those transforms.

The transformation is to whatever target format the consumer is comfortable with dealing with. In RSS Bandit the transformations are OCS/OPML to my internal feed list format and RSS 1.0 to RSS 2.0. There is no canonical transformation to one Über XML format that will solve every one's problems.  As for keeping a common, public repository of such transformations that is an interesting idea which I haven't seen anyone propose in the past. A publicly accessible database of XSLT stylesheets  for transforming between RSS 1.0 and RSS 2.0, WordML to HTML, etc. would be a useful addition to the XML community.

Sam Ruby muddies the waters in his post  Blind Spots and subsequent comments in that thread by confusing the use cases around XML as a data interchange format and XML as a storage data format. My comments above have been about XML as a data interchange format, I'll probably post more in future about RDF vs. XML as a data storage format using the thread in Sam's blog for context.


 

Categories: XML

Ken MacLeod writes

Clay Shirky criticizes the Semantic Web in his article, The Semantic Web, Syllogism, and Worldview, to which Sam Ruby accurately assesses, "Two parts brilliance, one part strawman."

Joe Gregorio responds to Shirky's piece with this very concrete statement:

This is exactly the point I made in The Well-Formed Web, that the value that the proponents of the Semantic Web were offering could be achieved just as well with just XML and HTTP, and we are doing it today with no use of RDF, no need to wait for ubiquitous RDF deployment, no need to wait for RDF parsing and querying tools.

Yet, in the "just XML" world there is no one that I know of working on a "layer" that lets applications access a variety of XML formats (schemas) and treat similar or even logically equivalent elements or structures as if they were the same. This means each XML application developer has to do all of the work of integrating each XML format (schema): N × M.

The difference between the RDF proponents and the XML proponents is fairly simple. In the XML-centric world parties can utilize whatever internal formats and data sources they want but exchange XML documents that conform to an agreed upon format, in cases where the agreed upon format conflicts with internal formats then technologies like XSLT come to the rescue. The RDF position is that it is too difficult to agree on interchange formats so instead of going down this route we should use A.I.-like technologies to map between formats. Note, that this doesn't mean transformations don't need to be done as Ken points out

The RDF model along with the logic and equivalency languages, like OWL (nee DAML+OIL),

Thus, if you are an XML practitioner RDF doesn't change much except new transformation techniques and technologies to learn.

Additionally as Clay Shirky points out, on investigation it isn't even clear whether the basic premises of  RDF and similar Semantic Web technologies is based on a firm foundation and sound logic. In conclusion Ken wrote,

One can take potshots at RDF for how it addresses the problem, and the Semantic Web for possibly reaching too far too quickly in making logical assertions based on relations modeled in RDF, but to dismiss it out of hand or resort to strawmen to attack it all while not recognizing the problem it addresses or offering an alternative solution simply tells me they don't see the problem, and therefore have no credibility in knocking RDF or the Semantic Web for trying to solve it.

I wonder if I'm the only one that sees the parallels between the above quote and statements that attributed to religious fundamentalists. I wonder if Ken is familiar with Perpetual Motion Machines? The problem they want to solve is real albeit impossible to solve. Does he also feel that no one has the credibility to knock any one of the numerous designs for one that have been proposed until the critic can themselves produce a perpetual motion machine?


 

Categories: XML

November 10, 2003
@ 03:42 AM

Say hello to Mike Deem


 

Categories: Life in the B0rg Cube

November 10, 2003
@ 03:40 AM

I saw the Regina Carter Quintet at Dimitrou's Jazz Alley last night and I must say I never thought a jazz violinist would sound so good. If you live in the Seattle area and have never checked out the Jazz Alley you need to give it a try. It's definitely the place to go for a nice night of good music and good food with a loved one.

 


 

Categories: Ramblings

Miguel De Icaza recently wrote

To make Linux a viable platform for mainstream desktop use our community needs to realize the importance of these third-party vendors and not alienate them. Having a stable API, and a stable ABI is very important for this reason. GNOME has learned this lesson and has strict commitments on ABI/API stability (thanks to our friends at Sun that pushed for this) and the XFree folks deserve the credit for making ABI compatibility across operating systems a reality for their drivers. Two good steps in the right direction.

This highlights one of the primary differences between amateur and professional software development. To professional software developers words like "backwards compatibility" and "no breaking changes" are words to live by regardless of how painful this can be at times, to amateur software developers they are the equivalent of four letter words that should be ignored in the quest to building something "faster, better and cheaper" than the rest.

 

An example of this just hit me in RSS Bandit development. I recently released RSS Bandit v1.2.0.42 which added

The ability to store and retrieve feed list from remote locations such as a dasBlog blog, an FTP server or a network file share. This enables users utilizing RSS aggregators on multiple machines to synchronize their feed list from a single point. This feature has been called a subscription harmonizer by some.

Around the time I shipped this release a new version of dasBlog (which is managed by Clemens Vasters) shipped and removed the XML Web Service end points that RSS Bandit was dependent upon for this feature. To see the difference you can compare the ConfigEditingService.asmx from my weblog to the ConfigEditingService.asmx on Torsten's weblog. Note that they are almost completely different. In the space of one release, dasBlog broke a feature in RSS Bandit and any other client that was written to target those end points.

Behavior like this is the bane of the Open Source world that Miguel decried. It's always so easy to change the source code and tell people to download the latest bits that few ever think of the churn they cause by indiscrimnately breaking applications, after all users can always download the newest bits or tweak the code themselves in the worst case.

Instead of finishing up the perf improvements I had in mind for RSS Bandit and working on perf I know have to figure out how to support both dasBlog APIs in a seemless manner without placing the burden on end users. You'll note that both my blog and Torsten's state that they are running dasBlog v1.3.3266.0 even though they expose different XML web service end points which means I can't even expect the user to be able to tell what version of dasBlog they are running with any accuracy.

What a waste of a Sunday evening...


 

Categories: RSS Bandit