November 30, 2003
@ 06:39 PM

The reviews are right, this game is the shit. It's been a while since I've actually said "Wow" out loud several times while playing a video game. A truly excellent game.


Categories: Ramblings

I recently wrote about LiveJournal's cookie-based authentication mechanism which makes it difficult for RSS aggregators to read "protected" LiveJournal feeds since the aggregator would have to "reuse steal cookies from your browser instead of using well defined HTTP authentication mechanisms".

My blog post and subsequent email to the LiveJournal development team resulted in the following response and discussion by the LiveJournal developer community as well as the following [excerpted] email response from Brad Fitzpatrick

We don't intend for aggregators to support our authentication system, and
we don't want it to be any sort of standard.  The fact that it works is
just an accident, really:  every page on our site is dynamic, and every
page knows who the remote user is, so when the RSS page queries the
recent entries for that user, the code which provides that is security
aware, and so doesn't provide things which it shouldn't.

Please tell people not to support our auth.  We don't want them to go
through that ugly hassle, and it might even change.  We don't consider it
a stable or supported interface at all.

Our intent is support HTTP Digest Auth in the future (but NOT basic auth)
specifically for RSS/Atom feed pages. 

I guess that clears things up. I'd like to thank the LiveJournal folks for promptly responding to my questions and clarifying the situation. Nice.


Categories: RSS Bandit

November 30, 2003
@ 04:55 PM
Chicken Little: In San Francisco, you never know what you're going to find when you knock on a car window -- but nothing prepared the cops for what they found the night of Nov. 3 down by Aquatic Park.

The window came down and there was a guy with a chicken sitting on his lap and a second chicken in a bag on the passenger seat.

"What's with the chickens?" the cop asked.

"I'm going to take them home and eat them,'' the driver replied.

"Lift up the chicken,'' the cop said.

The driver did -- and the next thing you know, the driver was in cuffs and the chickens were on their way to the humane society -- where (we kid you not) the hens were given a sexual battery exam by a vet the cops called in.

All we can say is, it's going to make for some very interesting testimony on the witness stand.

"But the killer will be the other evidence,'' a law enforcement source said. "A 15-ounce jar of Vaseline... with three feathers in it.''

[via Jamie Zawinski]


November 28, 2003
@ 05:19 PM

The Apple Human Interface Design Guidelines has a section on consistency which reads in part


Consistency in the interface allows people to transfer their knowledge and skills from one application to any other... Ask yourself the following questions when thinking about consistency in your product.

Is your product consistent:

  • Within itself?
  • With earlier versions of your product?
  • With Mac OS standards? For example, does your application use the reserved and recommended keyboard equivalents? (See “Keyboard Shortcuts”.)
  • In its use of metaphors?
  • With people’s expectations?.

Recently Torsten's been changing the user interface components used by RSS Bandit from the DotNetMagic library to the Tim Dawson's Windows Forms controls due to the fact that the former is no longer free as in beer. Given that we are changing the look and feel of the widgets Torsten thought this would also be a good time to rearrange some of the menu options and remove some of the toolbar buttons. I tend to disagree. User interface consistency between versions of an application is very important especially when you consider it messes with the muscle memoryof users of older versions of the application.

Torsten has posted screenshots of the new RSS Bandit UI and is asking for feedback. His questions are phrased differently than I'd ask. I'd ask if users want the user interface to be consistent with old versions of RSS Bandit or not? I'd also ask if users prefer that we keep the old DotNetMagic user interface or move to Tim Dawson's UI components?  

If you use RSS Bandit I'd appreciate your comments.


Categories: RSS Bandit

November 27, 2003
@ 04:51 AM

Robert Scoble writes

Lionel, in my comments: "the problem is that it's "common wisdom" that Microsoft has more than $40 billion in the bank, so your point doesn't *sound* true. "how can they talk about resource constraints with that kind of safe deposit""

This is a common misunderstanding. First of all. That cash isn't just given out willy nilly. It's NOT our money! It belongs to our investors. They want to see it spent properly. Translation: don't let Scoble spend it on whatever he wants!

In 1999, published an article called 12 Simple Secrets of Microsoft Management . One of the entries is entitled "Shrimps vs. Weenies" and is quoted below

7. "Shrimp vs. Weenies"

Even with its billions upon billions in cash, Microsoft is as frugal as Ebeneezer Scrooge. It's a company that buys canned weenies for food, not shrimp. Until last year, even Bill Gates and his second-in-command Steve Ballmer flew coach. (For scheduling reasons, the company purchased its first corporate jet.) Bucking the trend of most large, wealthy corporations, Microsoft remains in start-up mode where tight budgets are the rule. When you sit back and think about it, this frugality is less surprising and even explains how a company can come to accumulate such great hoards of cash.

This is probably the one of the most frustrating things to adjust to as a new hire at Microsoft; resource-strapped teams are the order of the day. There never seem to be enough devs to fix bugs and ship features or when there are there aren't enough testers to ensure that the code is up to snuff so you end up cutting the features anyway. Asking around about this leads to the realization that to many this is The Microsoft Way. I've heard all sorts of justifications for this behavior from the fact that it leads to managers making better hiring decisions since they never have as much headcount as they want so they don't waste it hiring people they aren't 100% sure will be good performers to statements like "it's always been this way". It's hard to argue with this logic given that this practice (and the others listed in the article) have lead to one of the most successful companies in the world with more cash on hand than the annual budget of most third world nations.  

However every time we cut some feature because we don't have enough test resources or scrap an idea because we don't have anyone to code it up, I wonder if there's a better way...



Categories: Life in the B0rg Cube

November 26, 2003
@ 10:48 PM

A few months ago Mark Pilgrim posted an blog entry entitled How to Consume RSS Safely where he points out

RSS, by design, is difficult to consume safely. The RSS specification allows for description elements to contain arbitrary entity-encoded HTML. While this is great for RSS publishers (who can just “throw stuff together” and make an RSS feed), it makes writing a safe and effective RSS consumer application exceedingly difficult. And now that RSS is moving into the mainstream, the design decisions that got it there are becoming more and more of a problem.

HTML is nasty. Arbitrary HTML can carry nasty payloads: scripts, ActiveX objects, remote image “web bugs”, and arbitrary CSS styles that (as you saw with my platypus prank) can take over the entire screen. Browsers protect against the worst of these payloads by having different rules for different “zones”. For example, pages in the general Internet are marked “untrusted” and may not have privileges to run ActiveX objects, but pages on your own machine or within your own intranet can. Unfortunately, the practice of republishing remote HTML locally eliminates even this minimal safeguard.

The workaround Mark proposes is that aggregators strip out a bunch of tags from the HTML content of a feed before displaying it to the user. The only problem with this approach is that sometimes users to want  to be able to view this dynamic content be it Flash animations or special behaviors on hovering the mouse over an image via Javascript. Well, in the next version of RSS Bandit this will be a user configurable option, below is what the default setting for the embedded web browser used by RSS Bandit will be.

RSS Bandit browser security settings tab

Categories: RSS Bandit

November 25, 2003
@ 09:30 PM

I'm probably the last geek in the US to have seen Matrix Revolutions and like most I'm of mixed minds about the experience. On the one hand as an action flick the movie isn't bad but as a Matrix sequel there are just too many issues with it that will probably prevent the multiple repeat viewings that I have enjoyed with the previous two movies.

Looking at the comments on the recent Slashdot poll about Matrix Revolutions it seems most people had to come up "deeper meanings" for the movie to prevent watching it from seeming like a waste of money. I've tried but I can't, as a Matrix movie it was anti-climactic especially after the confusing roller coaster ride that was Matrix Reloaded. Like everyone my beef is with the large number of unanswered questions from the previous movies. The paucity of martial arts fighting in this movie was also a minus.

However, if this was the first movie in the series I'd seen I'd probably have considered it a good movie.


Categories: Movie Review

The LiveJournal FAQ states

All journals on LiveJournal have an RSS feed, located at a URL of the form, where "exampleusername" is replaced by your username.

Only the 25 most recent entries are displayed on this RSS feed. Protected entries are visible if the user requesting the RSS feed is able to authenticate with LiveJournal and has permission to see the entries. For example, if you view your RSS feed in your browser while logged in, you will see all your most recent entries in it. However, someone who is not logged in, or someone you do not list as a friend, would not be able to see any protected entries in the feed. For most RSS aggregators and newsreaders, this will mean that only public entries are included. This is because they generally do not provide any means of cookie authentication.

I can't tell which stuns me more the fact that LiveJournal implemented an "authentication" mechanism that requires RSS aggregators to reuse steal cookies from your browser instead of using well defined HTTP authentication mechanisms or the fact that they implemented this ghetto authentication mechanism knowing full well that most aggregators don't support it.

Based on my reading of the FAQ, a user has to login via the website then somehow pass the cookie sent from the server in the HTTP response to their aggregator of choice which then uses this cookie in HTTP requests for the RSS feed?  All this, instead of password protecting the RSS feed using standard web practices?

We just got a feature request to somehow support this in RSS Bandit but it seems so wrong to encourage this broken design chosen by LiveJournal that I'm tempted to refuse the request. Is there anyone else subscribed to a LiveJournal RSS feed that thinks having this feature (the ability to view protected LiveJournal feeds) is important? So far, I believe this is the first LiveJournal specific request we've gotten.


Categories: RSS Bandit

Via a post on Don Box's weblog I noticed that quotes from my weblog have been used to further an incorrect assumption about Microsoft's technological direction with regards to XML technologies in the future versions of Windows (aka LongHorn) and other products.

Steve Gillmor writes

A key inducement for migrating to Longhorn is WinFS. FS means future storage, and the scheme is a new file storage system that will make it easier to store and find data. Instead of leveraging the XSD standard, Microsoft designers rolled a new schema language to handle WinFS' new capabilities

Clearly, Microsoft wants developers to create tomorrow's applications on Longhorn and WinFS. Right?  So why did Dare Obasanjo, program manager for .Net Framework XML schema technologies, have this to say: "The W3C XML Schema Definition language is far from being targeted for elimination from Microsoft's actively developed portfolio." Obasanjo listed a dozen Microsoft products using XSD, including "Yukon," Visual Studio .Net, "Indigo," Word, Excel and InfoPath

The last three form the core of Office System 2003, which Bill Gates touted as the strategic development platform for the near future at the New York launch. With Longhorn still far away, Microsoft is asking developers to invest in XSD for now—only to have to unlearn and migrate when Longhorn appears in 2006.

As several people have pointed out WinFS schema and XSD do completely different things. A few people have suggested that Microsoft "embrace and extend" XSD to make it suitable to describe WinFS types but bitter experience has shown that this course of action usually leads to confusion amongst our customers and recrimination from industry watchers. In the words of Chris Rock, "You could drive a car with your feet if ya want to, that doesn't make it a good  idea!".

However Steve Gillmor's piece does point out the fact that the next couple of Microsoft releases targetted at developers will be bringing a number of new technologies for developers to learn and there will be pushback from those who don't see why they have to adjust to the changing landscape. Just today, I got an email from someone who pointed out that users of data access technologies in the .NET Framework will now have almost half a dozen distinct query languages to chose from when retrieving data including OPath, XPath, XQuery, and SQL. There are reasons why each one exists

  • OPath is an object query language
  • SQL is a relational query language
  • XPath is a dynamically typed language for addressing parts of an XML document
  • XQuery is a statically typed language for performing sophisticated queries on one or more XML documents.

However stating it bluntly there are twice as many query languages that will exist whenever the next version of SQL Server & Visual Studio ship than in the last version (OPath & XQuery are the new comers). I suspect that much the same way Steve Gillmor is writing "the sky is falling" style articles about the fact that there will be a schema language for describing WinFS types seperate from that for describing XML documents (yet as Mike Deem points out no seems to be asking why not use SQL 'CREATE TABLE' statements to define WinFS types) there will be similar complaints about the amount of choice we are giving developers with regards to data access technologies and query languages.

Sometimes I wonder whether developers would prefer an Über-language with everything and the kitchen sink integrated into it. Would developers really prefer that instead of having divergent query languages we just had one (i.e. SQL) with proprietary extensions for the different data domains which was used ubiqitously everywhere to query XML documents, in-memory objects, relational databases, text files, etc? If reporters like Jon Udell and Steve Gillmor are to be believed then this is the preferred approach to building software since on the surface people get to reuse their skills except that things will work differently than they expect. I'm actually curious to hear from developers who read my weblog as to which approach they think is preferrable. For example, should one use SQL to query relational databases and XPath/XQuery for XML or should SQL be the universal query language used by all with any additions needed for XML querying being grafted on to it in most likely a proprietary manner? 

This inquiring mind would like to know.


Categories: Life in the B0rg Cube

November 23, 2003
@ 01:11 AM

Last night a went to an Irish bar with a couple of friends to watch the Rugby World Cup. It was a well-fought match that went into overtime with a number of tense moments eventually resulting in England being victorious over Ireland Australia . The price of admission was a bit steep ($20) but raucous bar atmosephere was a fun way to watch my first rugby match. It reminded me of American Football with no pads and soccer-isms like "offsides", "throw ins" and "free kicks". The fact that the ball could only be moved forward by running or kicking which explained all the backward passes was also quite different from American Football. Definitely an interesting experience.

Last weekend I was at the Drunk Puppet Nite which also turned out to be an interesting experience.  Although, the fliers make it seem like it's all puppet shows there were at least three dramatic pieces without puppets of the nine or ten I saw. The quality of the show ranged from very good to abysmal. Some of the puppet shows were funny because they were well done (the one with the kid whose talking toilet convinces him to steal laxatives so he can get to "eat some butt chocolate")  while others were because they were so poorly done (two guys who seemed to have been tripping off of acid with hand puppets arguing about who ate what from who's refridgerator) . Other aspects of the show were just plain weird, for instance the scene that consisted entirely of two matronly women at a church service [complete with choir music in the background] who ate bananas in a very suggestive manner. The show cost $15, considering that this is the price of two movie tickets or three movies from Blockbuster I'd say that price was a little steep and $10 would be more fair. In definitely, beat sitting around the house though.

On an unrelated note, one thing that connected both nights in my mind was that at both events I was the black guy. Just me, no other persons of African descent were in the audience. I'm completely used to it now but often wonder if it shouldn't bother me in some way.  

Anyway, I'm off to get a haircut.


Categories: Ramblings

Robert Scoble writes

We talked about a bunch of things. I laid out some things that I'd like to see RSS become. I'm gonna talk to Dave about that.

For instance, I have a vision of a day when every single Microsoft employee will have a weblog. Now, what happens when you have 55,000 people weblogging inside of a corporation? Well, for one, I want to see weblogs in different ways? Why shouldn't it be possible to see results from a search engine in order of where you are on the org chart, for instance? So, how can you match RSS data up with your domain data that's stored in Exchange and/or other corporate data stores?

How about seeing data from corporate webloggers based on revenues? Or other metrics?

Also, one thing I miss is being able to tell readers what I think are my most important items.

A number of these features have nothing to do with RSS, although I've seen several people claim otherwise. Scoble's post is just the most recent. Lots of people (including myself) see RSS news aggregators as being a step towards building a universal information aggregator. The closest thing to universal information aggregators that exist today are Personal Information Managers like Microsoft Outlook. At it's most basic description, Outlook is a mail reader meaning it nows how to use SMTP to send messages from one server to another; and how to retrieve messages  using either POP or IMAP. However over time Outlook has evolved into my primary interface to accessing information about people I interact with in a daily basis. I usually ask Outlook questions like

  1. Where are all the messages I've received from person X [about topic Y]?
  2. Where all the messages I have to respond to in the next Z days?
  3. Where are all the messages I received between date A and date B?
  4. Who is person X in my organization (who's his boss? what is his title?) 
  5. What is person X's schedule like for today or for the week?
  6. What meeting rooms are available for today or for the week?
  7. What do I have scheduled to do today?
  8. What is person X's phone number or email address?
  9. How do I let everyone who sends me a message know that I'll be on vacation?
  10. Show me internal discussion forums or mailing lists about topic X?

All of this functionality is exposed in a consistent user interface and it is hard for me as an end user to tell whether SMTP, IMAP, POP3 or whatever else is being used to service these requests. This is the same direction I believe people will want to go with news aggregators especially when I read some of the forward thinking feature requests that come from people like Marc Canter and Ray Ozzie. Even though at its most primal, an RSS news aggregator is a client that polls for messages in RSS format using HTTP there is a lot more functionality people want from want from clients.

The obvious (and in my opinion the wrong) way to solve these feature requests is add a lot of extra yet optional functionality to the base protocol, RSS. However using the Outlook example, it is clear that one doesn't need to completely go this route to solve the problem after all not all 10 of the pieces of functionality I described above have to do with SMTP (although a lot do). 

There are two reason's why I find WinFS interesting when it comes to build a universal information aggregator. Both of which have been pointed out by Mike Deem, the first is that  WinFS will be an item store which means that it will be possible to store abstract concepts such as "person" or "contact" on your file system as opposed to just concrete files such as buddylist.xml. The second is that WinFS's schemas play an even larger part in making WinFS what it is. The idea that there will be a common schema for “Person“ and “Document“ and “Album“ that can be shared, and extended, by thousands of Windows applications. The really important thing is having a shared concepts and schemas for both local applications and globally networked applications. Being able to actually store a person's contact info, weblog posts, mail messages, schedule and more on the file system and have these all linked together without the limitation that they all have to be the same file or that one must tie their client application to a database products makes developing a lot of the functionality I get from Outlook or that Scoble would like to get from an RSS aggregator much simpler. Being able to retrieve a calendar event from the Internet as XML either via some XML Web Service or HTTP GET then map that to a local concept of a calendar event on my machine which could then be used across applications would be very useful.

Of course, WinFS is currently at a stage that people like Diego Doval would call vaporware so this is just supposition on my part from reading Mike Deem's blog and conversations we've had as opposed to stuff that actually will ship in a future version of Windows. Even then the programming model may leave much to be desired, e.g the following code snippet from Mike Deem's blog

For example, to find all the people who live in the New York metropolitan region, you would write code like the following:

Person.FindAll(“IncomingRelationships.Cast(System.Storage.Contacts.HouseholdMember).Household.OutgoingRelationships.Cast(System.Storage.Core.ContactLocations).Location.MetropolitanRegion = ‘New York’“ );

So needless to say it isn't a slam dunk that "WinFS will solve all our problems" but I think the general ideas and functionality it brings to the table could prove very useful. In the meantime, I plan to hack the features I believe should be in a universal information aggregator client into RSS Bandit and will work with likeminded souls in moving the state of the art in that direction.


Categories: RSS Bandit

In his recent article entitled Binary Killed the XML Star? Kendall Clark writes

Many XML proponents and users came out of various binary exchange and format camps, and they are very unwilling to return to what were for them, or so it would seem, dark days. In this case, however, given the real power of those who most seem to want a binary variant -- including Sun, IBM, and Microsoft -- they may have to adopt a carefully tactical plan to limit the damage, rather than preventing the fight completely.

This claim by Kendall Clark seems to contradict the conclusions in the postion papers provided by both Microsoft and IBM at the The W3C Workshop on Binary Interchange of XML Information Item Sets.

IBM's position paper concludes with

IBM believes that wherever possible, implementations of the existing XML 1.x Recommendation should be optimized to meet the needs of customers. While we expect to see non-standard binary forms used internally within certain vendors’ implementations, including perhaps our own, we are not yet convinced that there is justification to standardize an interchange format other than XML 1.x. We thus believe that it would be premature for the W3C to launch a formal workgroup, or to recharter an existing group, to develop a Binary XML Recommendation

Microsoft's position paper concludes with

For different classes of applications, the criterion (minimize footprint or minimize parse/generate time) for the binary representation is different and often conflicting. There is no single criterion that optimizes all applications. Consequently, a binary standard could result in a suite of allowable representations that clients and servers must be prepared to receive and process. This is a retrograde step from the portability goals of XML 1.0. Furthermore, the optimal binary representation depends on the machine and OS architectures on each end — translating between binary representations negates much of the advantages that binary XML has over text.

Besides the position paper from Microsoft there've have been many comments both in Weblogs and mailing lists from Microsoft people against this movement for a standardized binary XML format (oxymoron that it is). There have been weblog posts by myself, Joshua Allen and Omri Gazitt (all of whom work on XML technologies at Microsoft) decrying the movement towards binary XML and thus potential fragmentation of the XML world.

There have also been a number of posts by Microsoft employees against  standardized binary XML on mailing list such as XML-DEV some of which have been quoted on Elliotte Rusty Harold's Cafe con Leche XML News website

I fear that splitting the interop story of XML into a textual and Infoset-based/binary representation, we are going to get the "divide and conquer" effect that in the end will make XML just another ASN.1: a niche model that does not deliver the interop it promises and we will be back to lock-in.

--Michael Rys on the xml-dev mailing list, Tue, 18 Nov 2003

XML has succeeded in large part because it is text and because it is perceived as "the obvious choice" to many people. The world was a lot different before XML came around, when people had to choose between a dizzying array of binary and text syntaxes (including ASN.1). Anyone who tries to complicate and fragment this serendipitous development is, IMO, insane.

--Joshua Allen on the xml-dev mailing list, Tue, 18 Nov 2003

Unfortunately, it seems that Kendall Clark must have missed the various discussions, weblog posts and the position paper where Microsoft's view of the importance of textual XML 1.0 were put forth. 


Categories: XML

November 18, 2003
@ 08:46 PM

Robert Scoble writes

Rob Fahrni answered back and said "Scoble's on one of the best teams inside Microsoft." I've landed on a good one, yes, but I totally disagree that it's the best. I've seen tons of teams that are doing interesting things. By the way, he says Visio is a failure? Well, does the Visio team have any webloggers? Does it have an RSS feed? How are you supposed to sell software if you don't have a relationship with your customers?

On the surface it reads like Robert Scoble is claiming that if you don't have a blogger on your team nor an RSS feed then you don't have a relationship with your customers. This is probably the funniest thing I've seen all week.

Scoble's post reminds me of the Cult of the Cluetrain Manifesto article by John Dvorak. It's always unfortunate when people take a bunch of decent ideas and turned them into near-religious beliefs. Being in touch with your customers in an informal and accessible manner is nice but it isn't the only way to communicate with your customers nor is it necessary to make you successful.

I love my iPod. I love my TiVo. I love my Infiniti G35. I love Mike's Hard Lemonade and Bacardi O3. None of these products have official webloggers that I'm aware of nor do they have an RSS feed for their websites that I'm subscribed to.  Furthermore, if competing products did it wouldn't change the fact that I'd still be all over the my iPod/TiVo/G35/etc.

Blogging and RSS feeds are nice, but they are the icing on the cake of interacting with and satisfying your customer needs not the end all and be all of them.


Categories: Ramblings

Elliotte Rusty Harold writes

In XSLT 1.0 all output is XML. A transformation creates a result tree, which can always be serialized as either an XML document or a well-formed document fragment. In XSLT 2.0 and XQuery the output is not a result tree. Rather, it is a sequence. This sequence may contain XML; but it can also contain atomic values such as ints, doubles, gYears, dates, hexBinaries, and more; and there's no obvious or unique serialization for these things. For instance, what exactly are you supposed to do with an XQuery that generates a sequence containing a date, a document node, an int, and a parentless attribute? How do you serialize this construct? That a sequence has no particular connection to an XML document was very troubling to many attendees.

Looking at it now, I'm seeing that perhaps the flaw is in thinking of XQuery as like XSLT; that is, a tool to produce an XML document. It's not. It's a tool for producing collections of XML documents, XML nodes, and other non-XML things like ints. (I probably should have said it that way last night.) However, the specification does not define any concrete serialization or API for accessing and representing these non-XML collections. That's a pretty big hole left to implementers to fill.

The main benefits of XQuery are as a better way to retrieve data from one or more XML documents than previous methods (i.e. a better XPath) not as a way to transform one XML structure to another (i.e. XSLT). I assume that if Elliotte Rusty Harold isn't familiar with APIs that provided XPath as a standalone language such as the .NET Framework's XPathNavigator, the Oracle XDK, or Jaxen since all of these provide a way to get atomic values (number, string, or boolean) as well as nodes from querying an XML document.

Similrly, there is no well defined way to serialize the results of performing an arbitrary XPath on an XML document. The tough parts for implementers aren't atomic values or XML fragments as Elliotte Rusty Harrold describes both more mundane things like attribute values. For instance consider the following document

<test xmlns:e="" e:id="1" />

queried using the following XPath expression


which returns the first attribute of the document element. How would one serialize these results? There are a bunch of options such as

  1. e:id="1"
  2. {}id="1"
  3. @e:id="1"
  4. {xmlns:e=""}id="1"

All of which I could argue are valid serializations of the attribute node returned by that query. By the way, the .NET Framework uses the first serialization if one calls XmlNode.OuterXml on the XmlAttribute object returned by executing that query on an XmlDocument object.

So what's my point? That the situation Elliotte Rusty Harrold bemoans as being unique to XQuery has always existed with XPath. Even more, as Oleg Tkachenko points out there is an  XSLT 2.0 and XQuery 1.0 Serialization draft recommendation which specifies how to serialize instances of the XPath 2.0/XQuery data model which even resolves the question about how one would serialize the results of the query above

It is a serialization error if an item in the sequence is an attribute node or a namespace node.

Short answer, you can't.


Categories: XML

Mike Sanders says that businesses are clamoring for web based apps:

SEPTEMBER 29, 2003 ( INFOWORLD ) - Web applications rule the enterprise. That's the indisputable conclusion to be drawn from this year's InfoWorld Programming Survey. Despite directives from Microsoft Corp. and others that developers abandon server-based HTML applications for fat desktop clients, the ease of "zero deployment" through the browser continues to win the day.

Only a fool what count Microsoft out. But only a fool would ignore what businesses are proclaiming loudly from their desktops - we want more browser apps now

[via James Robertson]

One of the biggest problems I face as a Program Manager is that product teams often focus on features instead of functionality. Since Microsoft is very developer-centric it is very easy for us to focus on the implementation details of customer requests instead of focusing on their requirements and business cases. The job of a PM is to ensure that we focus on the latter instead if the former.

The InfoWorld article and the subsequent comment by Mike Sanders are examples of concentrating on features ([D]HTML applications) as opposed to functionality (zero deployment applications).

The primary message from the InfoWorld article isn't that users do not want rich client applications like Mike Sanders implies but that they'd rather have zero deployment than a rich client. The main lesson I take away from this isn't that users do not want rich client applications but that if one plans to provide a rich client solution then it should be a zero deployment solution as well*.

* In today's world, this typically means using Flash or Javascript.

Categories: Life in the B0rg Cube

From Microsoft Announces Availability of Open and Royalty-Free License For Office 2003 XML Reference Schemas

Microsoft Corp. today announced the availability of a royalty-free licensing program for its Microsoft® Office 2003 XML Reference Schemas and accompanying documentation. ... Microsoft's new Office 2003 versions of Word, Excel and the InfoPath (TM) information-gathering program utilize schemas that describe how information is stored when documents are saved as XML....

To ensure broad availability and access, Microsoft is offering the royalty-free license using XML Schema Definitions (XSDs), the cross-industry standard developed by the W3C. The license provides access to the schemas and full documentation to interested parties and is designed for ease of use and adoption. The Microsoft Office 2003 XML Reference Schemas include WordprocessingML (Microsoft Office Word 2003), SpreadsheetML (Microsoft Office Excel 2003) and FormTemplate XML schemas (Microsoft Office InfoPath 2003).

The biggest gripe when Office 2003's XML support was announced was that the schemas for WordprocessingML (aka WordML) and co. were proprietary. This was reported in a number of fora including Slashdot & C|Net news. I wonder how many will carry the announcements that these schemas are available for all to peruse and reuse in a royalty free manner?

Update: On C|Net news: Microsoft pries open Office 2003

Update2: On Slashdot: Microsoft Word Document ML Schemas Published


Categories: XML

November 17, 2003
@ 06:32 AM

George Mladenov asked

Why does XAML need to be (well-formed) XML in the first place?

To which Rob Relyea responds with the following reasons

1.      Without extra work from the vendors involved, we’d like all XML editors be able to work with XAML.

2.      We’d like transformations (XSLT, other) be able to move content to/from XAML.

3.      We didn’t want to write our own file parsing code, the parser code we do have is built on top of System.XML.XmlTextReader.  We are able to focus on our value add.

Thus it looks like XAML's use of XML passes the XML Litmus Test, specifically

Using XML for a software development project buys you two things (a) the ability to interoperate better with others and (b) a number of off-the-shelf tools for dealing with format. If neither of these things apply to a given situation then it doesn't make much sense to use XML.

However there are tradeoffs to using XML, some of which Rob points out. They are listed below with some of my opinions

1.      We want to enable setting the Background property on a Button (for example) in one of two ways:

a.       Using a normal attribute - <Button Background=”Red”>Click Here</Button>

b.      Using our compound property syntax –


c.       Ideally if somebody tried to use both syntaxes at the same time, we could error.  XML Schema – as far as I am aware – isn’t well equipped to describe that behavior.


Being the PM for W3C XML Schema technologies in the .NET Framework means I get to see variations of this request regularly. This feature is typically called co-occurence constraints and is lacking in W3C XML Schema but is supported by other XML schema languages like RELAX NG and can be added to W3C XML Schema using Schematron annotations. Given the existing complexity of W3C XML Schema's conflicting design goals (validation language vs. type system) and contradictory rules I for one am glad this feature doesn't exist in the language.

However this means that users who want to describe their schemas using W3C XML Schema need to face the fact that not all the constraints of their vocabulary can be expressed in a schema which is always the case it's just that some constraints seem significant enough to be in the schema while others are OK being checked in code during "business logic processing". In such cases there are basically 3 choices (i) try to come as close as possible to describing the content model in the schema which sometimes may lead to what us language lawyers like to call "gross hacks" (ii) use an alternate XML schema language or extend the W3C XML Schema language in some way or (iii) live with the fact that some constraints won't be describable in the schema.

It is a point of note that althogh the W3C XML Schema recommendation contains what seems like a schema for Schema (sForS) (i.e. the rules of W3C XML Schema are themselves described as a schema) this is in fact not the case. The schema in the spec, although normative is invalid and even if it was valid still does not come close to rigidly specifying all the rules of W3C XML Schema. The way I look at it is simple, if the W3C XML Schema working group couldn't come up with a way to fully describe an XML vocabulary using XML Schema then the average vocabulary designer shouldn't be bothered if they can't either.

2.      It is a bit strange, for designers or developers moving from HTML to XML.  HTML is forgiving.  XML isn’t.  Should we shy away from XML so that people don't have to quotes around things?  I think not.

Having to put quotes around everything isn't the biggest pain in the transition from HTML to XML, and after a while it comes naturally. A bigger pain is dealing with ensuring that nested tags are properly closed and I'm glad I found James Clark's nxml-mode for Emacs which has helped a lot with this. The XML Editor in the next version of Visual Studio should also be similarly helpful in this regard.

The lack of the HTML predefined entities is also a bit of culture shock when moving to XML from HTML, and one some consider a serious bug with XML, I tend to disagree.

3.      It is difficult to keep XAML as a human readable/writable markup, as that isn’t one of XML’s goals.  I think it needs to be one of XAML’s goals.  It is a continual balancing act.

Actually one of the main goals of XML is to be human-readable, at least as human readable as HTML was since it was intended to replace HTML in the beginning. There's a quick history lesson in my SGML on the Web: A Failed Dream? post from earlier last month.


Categories: XML

November 17, 2003
@ 04:30 AM

I just finished watching a TiVoed episode of Justice League where a character died in battle. The character had been a moderately recurring one who was given some depth in the preceding episode before being killed in the following one.  Coupled with Disney's Brother Bear in which  a major character that's just been introduced ends up dying and another whose significance we learn later dies as well it seems like death in children's cartoons is no longer taboo. 

I remember  watching cartoons like Voltron & Thunder Cats as a kid and thinking that the fact that the major characters were never at risk of death made rooting for the good guys or against the bad guys a waste of time. Of course, I was one of the kids who was deeply affected when Optimus Prime bought it in Transformers: The Movie. That death was a solitary event in the cartoon landscape which didn't lead to the start of a trend as I expected and itself was diluted by the fact that they kept bringing Optimus Prime back in one shape or form every other episode.

This trip down memory lane makes me feel nostalgia for old episodes of my favorite cartoons. Time to go bargain hunting on Amazon.




Categories: Ramblings

November 15, 2003
@ 11:31 PM

We finally got around to adding some screen shots to the RSS Bandit wiki.

For those who are curious, there should be another release in the next couple of weeks. This should be mostly a bug fix release with a number of improvements in responsiveness of the GUI. The only noticeable new features should be a new preferences tab for adding search engines to the ones available from the search bar, the ability to apply themes to feed items from the preferences dialog without having to exit the dialog and the ability to search RSS items on disk.  

Hopefully, if I can get some cooperation from a couple of folks there also may be some changes to the subscription harmonization functionality.


Categories: RSS Bandit

Robert Scoble writes

Microsoft has 55,000 employees. $50 billion or so in the bank.

Yet what has gotten me to use the Web less and less lately? RSS 2.0.

Seriously. I rarely use the browser anymore (except to post my weblog since I use Radio UserLand).

See the irony there? Dave Winer (who at minimum popularized RSS 2.0) has done more to get me to move away from the Web than a huge international corporation that's supposedly focused on killing the Web.

Diego Duval responds

Robert: the web is not the browser.

Robert says that he's "using the web less and less" because of RSS. He's completely, 100% wrong.

RSS is not anti-web, RSS is the web at its best.

The web is a complex system, an interconnection of open protocols that run on any operating system
Let me say it again. The web is not the browser. The web is protocols and formats. Presentation is almost a side-effect.

Both of them have limited visions of what actually constitutes the World Wide Web. The current draft of the W3C's Architecture of the World Wide Web gives a definition of the Web that is more consistent with reality and highlights the limitations of both Diego and Robert's opinions of what consititutes the WWW. The document currently states

The World Wide Web is an network-spanning information space consisting of resources, which are interconnected by links defined within that space. This information space is the basis of, and is shared by, a number of information systems. Within each of these systems, agents (e.g., browsers, servers, spiders, and proxies) a provide, retrieve, create, analyze, and reason about resources.

This contradicts Robert's opinion that the web is simply about HTML pages that you can view in a Web browser and it contradicts Diego's statements that the Web is about "open" protocols that run on "any" operating system. There are a number of technologies that populate the Web whose "open-ness" some may question, I know better than the cast stones when I live in a glass house but there are a few prominent examples that come to mind.  

The way I read it, the Web is about URIs that identify resources that can be retrieved using HTTP by user agents. In this case, I agree with Diego that RSS 2.0 is all about the Web. A news aggregator is simply a Web agent  that retrieves a particular Web resource (the RSS feed) at periodic intervals on behalf of the user using HTTP as the transfer protocol.


Categories: Ramblings

November 14, 2003
@ 04:22 PM

 Fumiaki Yoshimatsu writes

Why does someone still think that they have to write Unicode BOMs by themselves, digging deep inside XmlTextWriter.BaseStream and UnicodeEncoding.GetPreamble?  Encoding hint in the XML declarations and Unicode BOMs are all about XML 1.0 thing, but WriteStartElement and WriteStartDocument are not.  They are InfoSet thing, so they do not have anything to do with the serialization format.  Think about XmlNodeWriter for example.  Why does XmlNodeWriter NOT have any constructor that have a parameter of type Encoding?  Why does it always call XmlDocument.CreateXmlDeclaration with null as the second argument?

This is a common point of confusion for users of XML in the CLR. XmlNodeWriter doesn't have a parameter of type Encoding because it writes to an XmlDocument which is stored in memory and all strings in the CLR are in UTF-16 encoding. Setting the encoding only matters when saving the XmlDocument to a stream. As for having to dig into XmlTextWriter.BaseStream to set the encoding, I find this  weird considering that the XmlTextWriter constructor has a number of ways to specifying the encoding on instantiating an instance of the class. Since XML 1.0 mandates that an XML document can only have one encoding there is no reason for methods like WriteStartElement and WriteStartDocument to concern themselves with encoding issues.  

If you really want to dive deep into issues involving specifying the encoding of XML documents and the CLR take a look at  this discussion in Robert McLaws's weblog.

PS: One of my pet peeves is the way people misuse the term XML infoset to mean "things in XML I don't care about" even though there is a precise definitition (nay an entire spec) that describes what it means. The document information item clearly has a [character encoding scheme] property which means character encodings are an XML infoset thing.


Categories: XML

November 14, 2003
@ 05:11 AM

Irwando the Magnificent (king of SQLXML) just pointed me at iPocalypse Photoshop. A number of the pseudo-engravings are quite amusing, my favorites are "Stolen music is better than sex" and "once you've had small and white..."

The photoshopped iPods in the scenes from Eddie Murphy's Haunted Mansion are also worth a snicker or two.



It looks like its confirmed that I'll be attending XML 2003.

Should be fun.


Categories: Ramblings

Oleg Tkachenko writes

Just found new beast in the Longhorn SDK documentation - OPath language:

The OPath language is the query language used to query for objects using an ObjectSpace. The syntax of OPath also allows you to query for objects using standard object oriented syntax. OPath enables you to traverse object relationships in a query as you would with standard object oriented application code and includes several operators for complex value comparisons.

Orders[Freight > 5].Details.Quantity > 50 OPath expression should remind you something familiar. Object-oriented XPath cross-breeded with SQL? Hmm, xml-dev flamers would love it.

The approach seems to be exactly opposite to ObjectXPathNavigator's one - instead of representing object graphs in XPathNavigable form, brand new query language is invented to fit the data model. Actually that makes some sense, XPath as XML-oriented query language can't fit all. I wonder what Dare think about it. More studying is needed, but as for me (note I'm not DBMS-oriented guy though) it's too crude yet

Oleg is right that an XML oriented query language like XPath doesn't fit for querying objects. There is a definitely an impedance mismatch between XML and objects, a good number of which were pointed out by Erik Meijer in his paper Programming with Circles, Triangles and Rectangles. A significant number of constructs and semantics of XPath simply don't make sense in a language designed to query objects. The primary construct in XPath is the location step which consists of an axis, a node test and zero or more predicates, of which both the axis and the node test are out of place in an object query language.

From the XPath Grammar, there are 13 axes of which almost none make sense for objects besides self. They are listed below

[6]    AxisName    ::=    'ancestor'
| 'ancestor-or-self'
| 'attribute'
| 'child'
| 'descendant'
| 'descendant-or-self'
| 'following'
| 'following-sibling'
| 'namespace'
| 'parent'
| 'preceding'
| 'preceding-sibling'
| 'self'

The ones related to document order such as preceding, following, preceding-sibling and following-siblings don't really apply to objects since there is no concept of order amongst the properties and fields of a class. The attribute axis is similarly unrelated since there is no equivalent of the distinction between elements and attributes among the fields and properties of a class. 

The axes related to document hierarchy such as parent, child, ancestor, descendent, etc look like they may make sense to map to object oriented concepts until one asks what exactly is meant to be the parent of an object? Is it the base class or the object to which the current object belongs as a field or property? Most would respond that it is the latter. However what happens when multiple objects have the same object as a field which is often the case since objects structures are graph-like not tree-like as XML structures? It also gets tricky when an object that is a field in one class is a member of a collection in another class. Is the object a child of the collection? If so what is the parent of the object, if not what is the relationship of the object to the collection then? The questions can go on...

On the surface the namespace axes sounds like it could map to concepts from object oriented programming since languages like C#, C++ and Java all have a concept of a "namespace". However namespace nodes in the XPath data model have distinct characteristics (such as the fact that each element node in document has a distinct set of namespace nodes regardless of whether each of these namespace nodes represent the same mapping of a prefix to a namespace URI). 

A similar argument can also be made around node tests which are the second primary constructs in XPath location steps. A node test either specifies a name or a type of node to match. A number of XPath node types don't have equivelants in the object oriented world such as comment and processing instruction nodes. Other nodes such as text and element nodes are problematic when one begins to try to tie them in to the various axes such as the parent axis.

Basically, a significant amount of XPath is not really applicable to querying objects without changing the semantics of certain aspects of the language in a way that conflicts with how XPath is used when querying XML documents.

As for how this compares to my advocacy of XML to object mapping techniques such as the ObjectXPathNavigator, the answer is simple; XML is the universal data interchange format and the software world is moving to a situation where all the major sources of important data can be accessed or viewed as XML from office documents to network messages to information locked within databases. It makes sense then that in creating this universal data access layer that one create a way for all interesting sources of data to be viewed as XML so they to can participate as input for data aggregation technologies such as XSLT or XQuery and enable the reuse of XML technologies for processing and manipulating them.


Categories: Life in the B0rg Cube | XML

November 11, 2003
@ 11:10 PM

I noticed the followingRDF Interest Group IRC chat log discussing my recent post More on RDF, The Semantic Web and Perpetual Motion Machines in my referrer logs. I found the following excerpts quite illuminating

15:43:42 <f8dy> is owl rich enough to be able to say that my <pubDate>Tue, Nov 11, 2003</pubDate> is the same as your <dc:date>2003-11-11</dc:date>

15:44:35 <swh> shellac: I believe that XML datatypes are...


16:08:15 <f8dy> that vocabulary also uses dates, but it stores them in rfc822 format

16:08:51 <f8dy> 1. how do i programmatically determine this?

16:08:58 <JHendler> ah, but you cannot merge graphs on things without the same URI, unless you have some other way to do it

16:09:02 <f8dy> 2. how do i programmatically convert them to a format i understand?


16:09:40 <shellac> 1. use


16:10:13 <shellac> 1. use a xsd library

16:10:32 <shellac> 2. use an xsd library


16:11:08 <JHendler> n. use an xsd library :->

16:11:30 <shellac> the graph merge won't magically merge that information, true

16:11:34 <JHendler> F: one of my old advisors used to say the only thing better than a strong advocate is a weak critic

This argument cements my suspicions that the using RDF and Semantic Web technologies are a losing proposition when compared to using XML-centric technologies for information interchange on the World Wide Web. It is quite telling that none of the participants who tried to counter my arguments gave a cogent response besides "use an xsd library" when in fact anyone with a passing knowledge of XSD would inform them that XSD only supports ISO 8601 dates and would barf on RFC 822 if asked to treat them as dates. In fact, this is a common complaint about them from our customers w.r.t internationalization [that and the fact decimals use a period as a delimiter instead of a comma for fractional digits]. 

Even in this simple case of mapping equivalent elements (dc:date and pubDate) the Semantic Web advocates cannot provide a solution to how their vaunted ontolgies can provide a solution to a problem the average RSS aggregator author solves in about 5 minutes of coding using off-the-shelf XML tools. It is easy to say philosphically that dc:date and pubDate after all, they are both dates, but another thing to write code that knows how to treat them uniformly. I am quite surprised that such a straightforward real-world example cannot be handled by Semantic Web technologies. Clay Shirky's The Semantic Web, Syllogism, and Worldview makes even more sense now.

One of my co-workers recently called RDF an academic plaything, after seeing how many of its advocates ignore the difficult real world problems faced by software developers and computer users today while pretending that obtuse solution to trivial problems are important, I've definitely lost any interest I had left in investigating any further about the Semantic Web.


Categories: XML

November 11, 2003
@ 03:40 PM

From the Memory Hole

The Memory Hole posted an extract from an essay by George Bush Sr. and Brent Scowcroft, in which they explain why they didn't have the military push into Iraq and topple Saddam during Gulf War 1. Although there are differences between the Iraq situations in 1991 and 2002-3, Bush's key points apply to both.

But a funny thing happened. Fairly recently, Time pulled the essay off of their site. It used to be at this link, which now gives a 404 error. If you go to the table of contents for the issue in which the essay appeared (2 March 1998), "Why We Didn't Remove Saddam" is conspicuously absent.

Ever since September 11, 2001 the news continues to sound more and more like excerpts from George Orwell's 1984. All is not lost though, it has been heartening to see that some teachers are using this incident as a way to teach their students about media literacy. My favorite is Rewriting History: The Dangers of Digitized Research by Peg Hesketh 


Categories: Ramblings

I always love the Top 50 IRC Quotes. Warning, some of them are a bit risqué.


Categories: Ramblings

My post from yesterday garnered a couple of responses from the RDF crowd who questioned the viability of the approaches I described. Below I take a look at some of their arguments and relate them to practical examples of exchanging information using XML I have encountered in my regular development cycle.  

Shelley Powers writes

One last thing: I wanted to also comment on Dare Obasanjo's post on this issue. Dare is saying that we don't need RDF because we can use transforms between different data models; that way everyone can use their own XML vocabulary. This sounds good in principle, but from previous experience I've had with this type of effort in the past, this is not as trivial as it sounds. By not using an agreed on model, not only do you now have to sit down and work out an agreement as to differences in data, you also have to work out the differences in the data model, too. In other words -- you either pay upfront, once; or you keep paying in the end, again and again. Now, what was that about a Perpetual Motion Machine, Dare?

In responding to Shelley's post it is easier for me if I use a concrete example. RSS Bandit uses a custom format that I came up with for describing a user's list of subscribed feeds. However in the wild, other news aggregators us differing formats such as OPML and OCS. To ensure that users who've used other aggregators can try out RSS Bandit without having to manually enter all their feeds I support importing feed subscription lists in both the OPML and OCS format even though this is distinct from the format and data model I use internally. This importation is done by applying an XSLT to the input OPML or  OCS file to convert it to my internal format then converting that XML into the RSS Bandit object model. The stylesheets took me about 15 to 30 minutes to write for each one. This is the XML-based solution.

Folks like Shelley believe my problem could be better solved by RDF and other Semantic Web technologies. For example, if my internal format was RDF/XML and I was trying to import an RDF-based format such as OCS then instead of using a language like XSLT that performs a syntactic transform of one XML format to the other I'd use an ontology language such as OWL to map between the data models of my internal format and OCS. This is the RDF-based solution.

Right of the bat, it is clear that both approaches share certain drawbacks. In both cases, I have to come up with a transformation from one represention of a feed list to another. Ideally, for popular formats there would be standard transformations described by others to move from one popular format to another (e.g. I don't have to write a transformation for WordML to HTML but do for WordML to my custom document format)  so developers who stick to popular formats simply have to locate the transformation as opposed to actually authoring it themselves. 

However there are further drawbacks to using the semantics based approach than using the XML-based syntactic approach. In certain cases, where the mapping isn't merely a case of showing equivalencies between the semantics of similarly structured elemebts  (e.g. the equivalent of element renaming such as stating that a url and link element are equivalent) an ontology language is insufficient and a Turing complete transformation language like XSLT is not.  A good example of this is another example from RSS Bandit. In various RSS 2.0 feeds there are two popular ways to specify the date an item was posted, the first is by using the pubDate element which is described as containing a string in the RFC 822 format while the other is using the dc:date element  which is described as containing a string in the ISO 8601 format. Thus even though both elements are semantically equivalent, syntactically they are not. This means that there still needs to be a syntactic transformation applied after the semantic transformation has been applied if one wants an application to treat pubDate and dc:date as equivalent. This means that instead of making one pass with an XSLT stylesheet to perform the transformation in the XML-based solution, two  transformation techniques will be needed in the RDF-based solution and it is quite likely that one of them would be XSLT.

The other practical concern is that I already know XSLT and have good books to choose from to learn about it such as Michael Kay's XSLT : Programmer's Reference and Jeni Tennison's XSLT and XPath On The Edge as well as mailing lists such as xsl-list where experts can help answer tough questions.

From where I sit picking an XML-based solution over an RDF-based one when it comes to dealing with issues involving interchange of XML documents just makes a lot more sense. I hope this post helps clarify my original points.

Ken MacLeod also wrote

In his article, Dare suggests that XSLT can be used to transform to a canonical format, but doesn't suggest what that format should be or that anyone is working on a common, public repository of those transforms.

The transformation is to whatever target format the consumer is comfortable with dealing with. In RSS Bandit the transformations are OCS/OPML to my internal feed list format and RSS 1.0 to RSS 2.0. There is no canonical transformation to one Über XML format that will solve every one's problems.  As for keeping a common, public repository of such transformations that is an interesting idea which I haven't seen anyone propose in the past. A publicly accessible database of XSLT stylesheets  for transforming between RSS 1.0 and RSS 2.0, WordML to HTML, etc. would be a useful addition to the XML community.

Sam Ruby muddies the waters in his post  Blind Spots and subsequent comments in that thread by confusing the use cases around XML as a data interchange format and XML as a storage data format. My comments above have been about XML as a data interchange format, I'll probably post more in future about RDF vs. XML as a data storage format using the thread in Sam's blog for context.


Categories: XML

Ken MacLeod writes

Clay Shirky criticizes the Semantic Web in his article, The Semantic Web, Syllogism, and Worldview, to which Sam Ruby accurately assesses, "Two parts brilliance, one part strawman."

Joe Gregorio responds to Shirky's piece with this very concrete statement:

This is exactly the point I made in The Well-Formed Web, that the value that the proponents of the Semantic Web were offering could be achieved just as well with just XML and HTTP, and we are doing it today with no use of RDF, no need to wait for ubiquitous RDF deployment, no need to wait for RDF parsing and querying tools.

Yet, in the "just XML" world there is no one that I know of working on a "layer" that lets applications access a variety of XML formats (schemas) and treat similar or even logically equivalent elements or structures as if they were the same. This means each XML application developer has to do all of the work of integrating each XML format (schema): N × M.

The difference between the RDF proponents and the XML proponents is fairly simple. In the XML-centric world parties can utilize whatever internal formats and data sources they want but exchange XML documents that conform to an agreed upon format, in cases where the agreed upon format conflicts with internal formats then technologies like XSLT come to the rescue. The RDF position is that it is too difficult to agree on interchange formats so instead of going down this route we should use A.I.-like technologies to map between formats. Note, that this doesn't mean transformations don't need to be done as Ken points out

The RDF model along with the logic and equivalency languages, like OWL (nee DAML+OIL),

Thus, if you are an XML practitioner RDF doesn't change much except new transformation techniques and technologies to learn.

Additionally as Clay Shirky points out, on investigation it isn't even clear whether the basic premises of  RDF and similar Semantic Web technologies is based on a firm foundation and sound logic. In conclusion Ken wrote,

One can take potshots at RDF for how it addresses the problem, and the Semantic Web for possibly reaching too far too quickly in making logical assertions based on relations modeled in RDF, but to dismiss it out of hand or resort to strawmen to attack it all while not recognizing the problem it addresses or offering an alternative solution simply tells me they don't see the problem, and therefore have no credibility in knocking RDF or the Semantic Web for trying to solve it.

I wonder if I'm the only one that sees the parallels between the above quote and statements that attributed to religious fundamentalists. I wonder if Ken is familiar with Perpetual Motion Machines? The problem they want to solve is real albeit impossible to solve. Does he also feel that no one has the credibility to knock any one of the numerous designs for one that have been proposed until the critic can themselves produce a perpetual motion machine?


Categories: XML

November 10, 2003
@ 03:42 AM

Say hello to Mike Deem


Categories: Life in the B0rg Cube

November 10, 2003
@ 03:40 AM

I saw the Regina Carter Quintet at Dimitrou's Jazz Alley last night and I must say I never thought a jazz violinist would sound so good. If you live in the Seattle area and have never checked out the Jazz Alley you need to give it a try. It's definitely the place to go for a nice night of good music and good food with a loved one.



Categories: Ramblings

Miguel De Icaza recently wrote

To make Linux a viable platform for mainstream desktop use our community needs to realize the importance of these third-party vendors and not alienate them. Having a stable API, and a stable ABI is very important for this reason. GNOME has learned this lesson and has strict commitments on ABI/API stability (thanks to our friends at Sun that pushed for this) and the XFree folks deserve the credit for making ABI compatibility across operating systems a reality for their drivers. Two good steps in the right direction.

This highlights one of the primary differences between amateur and professional software development. To professional software developers words like "backwards compatibility" and "no breaking changes" are words to live by regardless of how painful this can be at times, to amateur software developers they are the equivalent of four letter words that should be ignored in the quest to building something "faster, better and cheaper" than the rest.


An example of this just hit me in RSS Bandit development. I recently released RSS Bandit v1.2.0.42 which added

The ability to store and retrieve feed list from remote locations such as a dasBlog blog, an FTP server or a network file share. This enables users utilizing RSS aggregators on multiple machines to synchronize their feed list from a single point. This feature has been called a subscription harmonizer by some.

Around the time I shipped this release a new version of dasBlog (which is managed by Clemens Vasters) shipped and removed the XML Web Service end points that RSS Bandit was dependent upon for this feature. To see the difference you can compare the ConfigEditingService.asmx from my weblog to the ConfigEditingService.asmx on Torsten's weblog. Note that they are almost completely different. In the space of one release, dasBlog broke a feature in RSS Bandit and any other client that was written to target those end points.

Behavior like this is the bane of the Open Source world that Miguel decried. It's always so easy to change the source code and tell people to download the latest bits that few ever think of the churn they cause by indiscrimnately breaking applications, after all users can always download the newest bits or tweak the code themselves in the worst case.

Instead of finishing up the perf improvements I had in mind for RSS Bandit and working on perf I know have to figure out how to support both dasBlog APIs in a seemless manner without placing the burden on end users. You'll note that both my blog and Torsten's state that they are running dasBlog v1.3.3266.0 even though they expose different XML web service end points which means I can't even expect the user to be able to tell what version of dasBlog they are running with any accuracy.

What a waste of a Sunday evening...


Categories: RSS Bandit

Joe Gregorio has a new blog post entitled Longhorn versus the light of day where he makes some valid points about three of the major technologies mentioned at the Microsoft's Professional Developer Conference (namely Indigo, Avalon and WinFS) such as the fact that they are anything from two to three from shipping then it'll be a number of years after that until they're usage is widespread enough to make a significant difference. However he claims that the technologies aren't even worthwhile specifically he states

Three major components, all useless in their own right, now stacked together on a shaky foundation to make an even more useless heap.

Now this statement doesn't seem to jibe with any of the opinions I've seen of people who actually develop for Windows and have heard of the three aforementioned technologies. Below are my one sentence summaries of all three

  • Avalon: A set of new UI widgets for the operating system and a managed programming model for accessing them.
  • Indigo: The next generation of DCOM except this time it is built on open standards and will interop with any platform with an XML parser as opposed to the old days when it was mainly a Windows<->Windows thing.
  • WinFS: A consistent way to add and query metadata to files.

As a hobbyist Windows developer who's built a Windows rich client application using C# (i.e. RSS Bandit) all three of the above are very useful advances.

  • Avalon: Now I can actually build a rich UI in C# without (a) having to use P/Invoke or COM interop to talk to native Windows widgets or (b) using a third party library that uses P/Invoke or COM interop to talk to native Windows widgets. Currently for RSS Bandit I use both and the free third party UI library I was using (the Magic Library) just went commercial ($200 a pop). I don't see why after paying money for Windows and Visual Studio I still have to cough up dollars or jump through hoops before I can build a UI with dockable Windows. Anything Microsoft does to remedy this is a good thing as far as I'm concerned.

  • Indigo: RSS Bandit already uses Web Services to talk to servers, for instance I use for subscription harminization where I can post my feedlist to my dasBlog weblog I talk to the ConfigEditingService it exposes. However there is a lot more that I'd like RSS Bandit to be able to do when communicating with weblogs such as query what services they support (i.e. WS-Policy) and then when communications occurs doing so in a flexible yet secure manner (i.e. WS-Security, WS-SecureConversation and WS-Authorization). Indigo will give me all this stuff as operating system facilities and they will interop with any platform where an implementation of these specs exists.

  • WinFS: The most frequently requested feature for RSS Bandit is for the ability for users to search the feeds they've downloaded on disk. Currently I have two choices here, store all the XML in objects then loop over them to perform searches or store them on disk as XML then fire up an XPath engine which loops each directory and queries each file. The first is fairly memory intensive while the second is slow. I suspect SharpReader does the former second which explains why people regularly complain about the fact that it sometimes uses over 100MB of memory. With WinFS I'll be able to do the latter and stuff like performant query will be handled by the OS and the metadata to search against will be added by RSS Bandit.

Even as a hobbyist developer working on Windows technologies, all three of the technolgies Joe Gregorio calls useless will be of use to me. I can only assume that Mr. Gregorio is not a Windows developer which would explain his point of view.


Categories: Life in the B0rg Cube

November 7, 2003
@ 03:23 PM

I've posted previously on why I think the recent outcry for the W3C to standardize on a binary format for the representation of XML information sets (aka "binary XML") is a bad idea which could cause significant damage to interoperability on the World Wide Web. Specifically I wrote

Binary XML Standard(s): Just Say No

Omri and Joshua have already posted the two main reasons why attempting to create a binary XML standard is folly (a) the various use cases and requirements are contradictory (small message size for low bandwidth situations and minimal parsing/serialization time for sitautions where minimizing processing time is prime) thus a single standard is unlikely to satisfy a large proportion of the requesters and (b) creation of a binary XML standard especially by an organization such as the W3C muddies the water with regards to interop, people already have to worry about the interop pain that will occur whenever XML 1.1 gets out of the door (which is why Elliotte Rusty Harold advises avoiding it like the plague) let alone adding one or more binary XML standards to the mix.

I just read the report from the W3C Workshop on Binary Interchange of XML Information Item Sets and I'm glad to see the W3C did not [completely] bow to pressure from certain parties to start work on a "binary XML" format. The following is the conclusion from the workshop 


The Workshop concluded that the W3C should do further work in this area, but that the work should be of an investigative nature, gathering requirements and use cases, and prepare a cost/benefit analysis; only after such work could there be any consideration of whether it would be productive for W3C to attempt to define a format or method for non-textual interchange of XML.

See also Next Steps below for the conclusions as they were stated at the end of the Workshop.

This is new ground for the W3C. Usually W3C working groups are formed to take competing requirements from umpteen vendors and hash out a spec. Of course, the problems with this approach is that it doesn't scale. It may have worked for HTML when the competing requirements primarily came from two vendors but now that XML is so popular it doesn't work quite as well, as Tim Bray put it "any time there's a new initiative around XML, there are instantly 75 vendors who want to go on the working group".

It's good to see the W3C decide to take an exploratory approach instead of just forging ahead to create a spec that tries to satisfy myriad competing and contradictory requirements. They've done this before with W3C XML Schema (and to a lesser extent with XQuery) and the software industry is still having difficulty digesting the results. Hopefully at the end of their investigation they'll come to the right conclusions.


Categories: XML

The one where I find out why I'm getting so many referrals from InfoWorld's website.

Categories: Ramblings

One of the biggest concerns about RSS is the amount of bandwidth consumed by wasteful requests. Recently on an internal mailing list discussion there was a complaint about the amount of bandwidth wasted because weblog servers send a news aggregator an RSS feed containing items it has already seen. A typical news feed contains 10 - 15 news items where the oldest is a few weeks old and the newest is a few days old. A typical user has used their news aggregator to fetch the an RSS feed about once every other day. This means on average at least half the items in an RSS feed are redundant to people who are subscribed to the feed yet everyone (client & server) incurs bandwidth costs by having the redundant items appear in the feeds.

So how can this be solved? All the pieces to solve this puzzle are already on the table. Every news aggregator worth it's salt (NetNewsWire, SharpReader, NewsGator, RSS Bandit, FeedDemon, etc) uses HTTP Conditional GET requests. What does that mean in English? It means that most aggregators send information about when last they retrieved the RSS feed via the If-Modified-Since HTTP header and a the hashcode of the RSS feed provided by the server the last time it was fetched via the If-None-Match HTTP header. The interesting point is that although most news aggregators tell the server the last time they fetched the RSS feed almost no weblog server I am aware of actually uses this information to tailor the information sent back in the RSS feed. The weblog software I use is guilty of this as well.

If you fetched my RSS feed yesterday or the day before there is no reason for my weblog server to send you a 200K file containing five entries from last week which it currently does. Actually it is worse, currently my weblog software doesn't even perform the simple check of seeing whether there are any new items before choosing to send down a 200K file.

Currently this optimization is the one performed by weblog servers, if there are no new items then a HTTP 304 response is sent otherwise a feed containing the last n items is sent. A further optimization is possible where the server only sends down the last n items newer than the If-Modified-Since date sent by the client.

I'll ensure that this change makes it into the next release of dasBlog (the weblog software I use) and if you use weblog software I suggest requesting that your software vendor to do the same.

UPDATE: There is a problem with the above proposal in that it calls for a reinterpretation of how If-Modified-Since is currently used by most HTTP clients and directly violates the HTTP spec which states

b) If the variant has been modified since the If-Modified-Since
         date, the response is exactly the same as for a normal GET.

The proposal is still valid except that this time instead of misusing the If-Modified-Since header I'd propose that clients and servers respect a new custom HTTP header such as "X-Feed-Items-New-Than"  whose value would be a date in the same format as that used by the If-Modified-Since header.


Categories: XML

November 5, 2003
@ 02:51 AM

The following screenshot reminded me of recent comments by Robert Scoble on blogs and conversational marketing


Categories: Ramblings

In response to my earlier post on his "Replace & Defend" theory Jon Udell writes

We have yet to even scratch the surface of what's possible given these circumstances. And now here comes WinFS with its own proprietary schema language. In recent years, it's been popular to layer innovation on top of base standards. So XSLT, XQuery, and SQL200n all rely on XPath, as WSDL relies on XSD. Yet no base standards beyond XML itself were of use to WinFS? It puzzles me. The things defined in WinFS don't seem exotic or mysterious. "A WinFS Contact type," the docs say, "has the Item super type. Person, Group, and Organization are some of its subtypes." If XSD can't model such things, we're in real trouble.

I doubt that anyone is claiming that W3C XML Schema cannot model containment or type derivations, however the way it does implements type derivation leaves much to be desired. In fact, this is the topic of an article I wrote that showed up on last week entitled XML Schema Design Patterns: Is Complex Type Derivation Unnecessary?. This is just another example of how things that seem straightforward end up being fairly complicated in W3C XML Schema. As Don Box puts it XML Schema has already eclipsed C++ in terms of complexity. Given that the WinFS schema language isn't even about modelling XML documents it seems perplexing that one would expect it to take on the complexity of using W3C XML Schema as its modelling language.

Of course WinFS does much more than model datatypes and structures. It's a highly sophisticated storage system that supports relational, object, and XML access styles, and that treats relationships among items as first-class objects in themselves (a potent feature I first encountered in the object database world years ago.) Great stuff! But the terminology of the Longhorn docs is revealing. Person, Contact, and Organization items are referred to as "Windows types," presumably because their schemata appear as classes in Longhorn's managed API. But to me these are universal types, not Windows types. I had expected them to be defined using XML Schema, and to be able to interoperate directly with SOAP payloads and XML documents on any platform.

Being defined using W3C XML Schema and being able to interoperate directly with XML documents on any platform are orthogonal. Information in relational databases like SQL Server is described using relational schema languages (i.e. SQL) however this hasn't stopped Microsoft from creating myriad ways to extract XML from SQL Server such as SQLXML, FOR XML queries and  the .NET Framework's DataSet class which information stored in relational databases to interoperate directly with SOAP payloads and XML documents. No one would claim that the fact that the data in a relational database is not defined using W3C XML Schema (or RELAX NG) makes it impossible to extract XML from a relational database or view it as an XML data source. WinFS is no different.

It's troubling, though, that the architects must be consulted to find out whether Longhorn's "Windows types" will be transferable to standards-based software.

I don't work on WinFS so there was no chance I'd make a definitive statement about what features they plan to support or not. This is simply common sense on my part  not an indication one way or the other about the degree of XML support in WinFS. With any luck I'll soon be able to get one of the WinFS folks to start blogging and then more accurate information can be gotten from the horses mouth.


Categories: Life in the B0rg Cube

I was planning to write this month's Extreme XML column on the recently released EXSLT.NET implementation produced by myself and a couple of others. One of the cool things about the EXSLT.NET project is that we added the ability to use the EXSLT extension functions in XPath queries over any data source that provides an XPathNavigator (i.e. implements IXPathNavigable). Thus one would be able to use functions like set:distinct and regexp:match when running XPath queries over objects that implement the IXPathNavigable interface such as the XPathDocument, XmlDocument or XmlDataDocument.  

In constructing my examples I decided that it would be even cooler to show the extensibility of the .NET Framework if I showed how one could use the XPath extension functions in queries over implementations of XPathNavigator not provided by the .NET Framework such as my perennial favorite, the ObjectXPathNavigator.

After fixing some bugs in the ObjectXPathNavigator implementation on MSDN (MoveToParent() didn't take you to the root node from the document element and the navigator only exposed public properties but not public fields) I came across a problem which will probably turn into yet another project on GotDotNet workspaces. The heuristics the ObjectXPathNavigator uses to provide an XML view of an arbitrary object graph doesn't take into account the class annotations used by XML Serialization in the .NET Framework. Basically this means that if one reads in an XML document, converts it to objects using the XmlSerializer then creates an ObjectXPathNavigator over the objects...the XML view of the object provided by the ObjectXPathNavigator would not be the same as the XML generated when the class is serialized as XML via the XmlSerializer.

In fact for the ObjectXPathNavigator to provide the same XML view of an objects as the XmlSerializer would involve having it understand the various attributes for annotating classes from the System.Xml.Serialization namespace. Considering that in the future, the XPathNavigator should be the primary API for accessing XML in the .NET Framework it would be extremely quite useful if there was an API that allowed any object to be treated as a first class citizen of the XML world. The first step was the XmlSerializer which allowed any class to be saved and loaded to and from XML streams, the next step should be enabling any object to be accessed in the same way XML documents are as well. Instant benefits are things like the ability to perform XPath and XSLT over arbitrary objects. In the Whidbey/Yukon (Visual Studio Server timeframe this means getting stuff like XQuery over objects  or the ability to convert any object graph to an XmlReader for free.

It looks like I have a winter project, but first I have to finish this month's column on EXSLT.NET. *sigh*


Categories: XML

November 3, 2003
@ 04:04 AM

I just spent two hours trying to figure out why a chunk of code was throwing exceptions when compiled and run from my command line application but similar code worked fine when running in Visual Studio. The problem turned out to be a bug in the .NET Framework's XslTransform class which existed in v1.0 of the .NET Framework but not v1.1. Since I was using Visual Studio.NET 2003 [which uses v1.1 of the .NET Framework] to run my test app but compiling my actual application with the compiler for v1.0 of the .NET Framework I wasn't actually doing an apples-for-apples comparison when running both apps.

I'm tempted to uninstall v1.0 of the .NET Framework from my machine so I don't end up with facing this problem again. What a waste of time.



Categories: Ramblings