July 15, 2006
@ 10:25 PM

Nathan Torkington has a blog post entitled A Week in the Valley: GData on the O'Reilly Radar blog that talks about the growth of the usage of GData & the Atom Publishing Protocol within Google as well as Marc Lukovsky's take on how this compared to his time at Microsoft working on Hailstorm. Nat writes

They're building APIs to your Google-stored data via GData, and it's all very reminiscent of HailStorm. Mark, of course, was the architect of that. So why's he coming up with more strategies to the same ends? I figure he's hoping Google won't screw it up by being greedy, the way Microsoft did...The reaction to the GData APIs for Calendar have been very positive. This is in contrast to HailStorm, of course, which was distrusted and eventually morphed its way through different product names into oblivion. Noting that Mark's trying again with the idea of open APIs to your personal data, I joked that GData should really be "GStorm". Mark deadpanned, " I wanted to call it ShitStorm but it didn't fly with marketing".

Providing APIs to access and manipulate data owned by your users is a good thing. It extends the utility of the data outside that of the Web applications that may be the primary consumer of the data and it creates an ecosystem of applications that harness the data. This is beneficial to customers as can be seen by looking around today at the success of APIs such as the MetaWeblog API, Flickr API or del.icio.us API.

Five years ago, while interning at Microsoft, I saw a demo about Hailstorm in which a user visiting an online CD retailer was showed an ad for a concert they'd be interested in based on their music preferences in Hailstorm. The thinking here was that it would be win-win because (i) all the user's data is entered and stored in one place which is convenient for the user (ii) the CD retailer can access the user's preferences from Hailstorm and cut a deal with the concert ticket provider to show their ads based on user preferences and (iii) the concert ticket provider gets their ads shown in a very relevant context.

The big problem with Hailstorm is that it assumed that potential Hailstorm partners such as retailers and other businesses would give up their customer data to Microsoft. As expected most of them told Microsoft to take a long walk of a short pier. 

Unfortunately Microsoft didn't take the step of opening up these APIs to its online services such as Hotmail and MSN Messenger but instead quietly canned the project. Fast forward a few years later and the company is now playing catchup to ideas it helped foster. Amusingly, people like Mark Lucovsky and Vic Gundotra who were influential during the Hailstorm days at Microsoft are now at Google rebuilding the same thing.

I've taken a look at GData and have begun to question the wisdom of using Atom/RSS as the baseline for information interchange on the Web. Specifically, I have the same issues as Steven Ickman raised in a comment on DeWitt Clinton's blog where he wrote

From a search perspective I’d argue that the use of either format, RSS or Atom, is pretty much a hack. I think OpenSearch is awesome and I understand the motivators driving the format choices but it still feels like a hack to me.

Just like you I want to see rich structured results returned for queries but both formats basically limit you to results of a single type and contain a few known fields (i.e. link, title, subject, author, date, & enclosure) that are expected to be common across all items.

Where do we put the 100+ Outlook defined contact fields and how do we know that a result is a contact and not an appointment or auction? Vista has almost 1000 properties defined in its schema so how do we convey that much metadata in a loseless way? Embedded Microformats are a great sugestion for how to deal with richer content but it sort of feels like a hack on top of a hack to me? What’s the Microformat for an auction? Do I have to wait a year for some committee to arrive at joint aggreement on what attributes define an auction before I can return structured auction results?

When you have a hammer, everything looks like a nail. It seems Steven Ickman and I reviewed OpenSearch/GData/Atom with the same critical lens and came away with the same list of issues. The only thing I'd change in his criticism is the claim that both formats (RSS & Atom) limit you to results of a single type, that isn't the case. Nothing stops a feed from containing data of wildly varying types. For example, a typical MSN Spaces RSS feed contains items that represent blog posts, photo albums, music lists, and book lists which are all very different types.

The inability to represent hierarchical data in a natural manner is a big failing of both formats. I've seen the Atom Threading Extensions but that seems to be a very un-XML way for an XML format to represent hierarchy. Especially given how complicated message threading algorithms can be for clients to implement.

It'll be interesting to see how Google tackles these issues in GData.