GData isn't a Best Practice Implementation of the Atom Publishing Protocol

June 11, 2007

@ 03:00 AM

I recently posted a blog post entitled Why GData/APP Fails as a General Purpose Editing Protocol for the Web which pointed out some limitations in the Atom Publishing Protocol (APP) and Google's implementation of it in GData with regards to being a general purpose protocol for updating data stores on the Web. There were a lot of good responses to my post from developers knowledgeable about APP including the authors of the specification, Bill de hÓra and Joe Gregorio. Below are links to some of these responses

Joe Gregorio: In which we narrowly save Dare from inventing his own publishing protocol
Bill de hÓra: APP on the Web has failed: miserably, utterly, and completely
David Megginson: REST, the Lost Update Problem, and the Sneakernet Test
Bill de hÓra: Social networks, web publishing and strategy tax
James Snell: Silly

There was also a post by Tim Bray entitled So Lame which questions my motives for writing the post and implies that it is some sinister plot by Microsoft to make sure that we use proprietary technologies to lock users in. I guess I should have given more background in my previous post. The fact is that lots of people have embraced building RESTful Web Services in a big way. My primary concern now is that we don't end up seeing umpteen different RESTful protocols from Microsoft [thus confusing our users and ourselves] and instead standardize on one or two. For example, right now we already have Atom+SSE, Web3S and Project Astoria as three completely different RESTful approaches for updating or retrieving data from a Microsoft data source on the Web. In my mind, that's two too many and that's just the stuff we've made public so there could be more. I'd personally like to see us reduce the set of RESTful protocols coming out of Microsoft to one and even better end up reusing existing Web standards, if possible. Of course, this is an aspiration and it is quite possible that all of these protocols are different for a reason (e.g. we have FTP, SMTP, and HTTP which all can be used to transfer files but have very different use cases) and there is no hope for unification let alone picking some existing standard. My previous post was intended to point out the limitations I and others had noticed with using the Atom Publishing Protocol (APP) as a general protocol for updating data stores that didn't primarily consist of authored content. The point of the post was to share these learnings with other developers working in this space and get feedback from the general developer community just in case there was something wrong with my conclusions.

Anyway, back to the title of this post. In my previous post I pointed out to the following limitations of APP as a general purpose protocol for editing Web content

Mismatch with data models that aren't microcontent
Lack of support for granular updates to fields of an item
Poor support for hierarchy

I have to admit that a lot of my analysis was done on GData because I assumed incorrectly that it is a superset of the Atom Publishing Protocol. After a closer reading of the ~~fifteenth~~ most recent draft APP specification spurred by the responses to my post by various members of the Atom community it seems clear that the approaches chosen by Google in GData run counter to the recommendations of Atom experts including both authors of the spec.

For problem #1, the consensus from Atom experts was that instead of trying to map a distinct concept such as a Facebook user to an Atom entry complete with a long list of proprietary extensions to the atom:entry element, one should instead create a specific data format for that type then treat it as a distinct media type that is linked from atom:entry. Thus in the Facebook example from my previous post, one would have a distinct user.xml file and a corresponding atom:entry which linked to it for each user of the system. Contrast this with the use of the gd:ContactSection in an atom:entry for representing a user. It also seems that the GData solution to the problem of what to put in the elements such as atom:author and atom:summary which are required by the specification but make no sense outside of content/microcontent editing scenarios is to omit them. It isn't spec compliant but I guess it is easier than putting in nonsensical values to satisfy some notion of a valid feed.

For problem #2, a number of folks pointed out that conditional PUT requests using ETags and the If-Match header are actually in the spec. This was my oversight since I skipped the section since the title "Caching and Entity Tags" didn't imply that it had anything to do with dealing with the lost update problem. I actually haven't found a production implementation of APP that supports conditional PUTs this shouldn't be hard to implement for services that require this functionality. This definitely makes the lost update problem more tractable. However a model where a client can just say "update the user's status message to X" still seems more straightforward than one where the client says "get the entire user element", "update the user's status message to X on the client", "replace the user on the server with my version of the user", and potentially "there is a version mismatch so merge my version of the user with the most recent version of the user from the server and try again". The mechanism GData uses for solving the lost update problem is available in the documentation topic on Optimistic concurrency (versioning). Instead of using ETags and If-Match, GData appends a version number to the URL to which the client publishes the updated atom:entry and then cries foul if the client publishes to a URL with an old version number. I guess you could consider this a different implementation of conditional PUTs from what is recommended in the most recent version of the APP draft spec.

For problem #3, the consensus seemed to be to use a atom:link elements to show hierarchy similar to what has been done in Atom threading extensions. I don't question the value of linking and think this is a fine approach for the most part. However, the fact is that in certain scenarios [especially high traffic ones] it is better for the client to be able to make requests like "give me the email with message ID 6789 and all the replies in that thread" than "give me all the emails and I'll figure out the hierarchy I'm interested in myself by piecing together link relationships". I notice that GData completely punts on representing hierarchy in the MessageKind construct which is intended for use in representing email messages.

Anyway I've learned my lesson and will treat the Atom Publishing Protocol (APP) and GData as separate protocols instead of using them interchangeably in the future.

Categories: XML Web Services

Tracked by:
"podcast directory" (podcast directory) [Trackback]

« Why GData/APP Fails as a General Purpose... | Home | Microsoft and the Atom Publishing Protoc... »

Dare Obasanjo's weblog

"You can buy cars but you can't buy respect in the hood" - Curtis Jackson

Navigation for GData isn't a Best Practice Implementation of the Atom Publishing Protocol - Dare Obasanjo's weblog