Recently I wrote a blog post entitled Google GData: A Uniform Web API for All Google Services where I pointed out that Google has standardized on GData (i.e. Google's implementation of the Atom 1.0 syndication format and the Atom Publishing Protocol with some extensions) as the data access protocol for Google's services going forward. In a comment to that post Gregor Rothfuss wondered whether I couldn't influence people at Microsoft to also standardize on GData. The fact is that I've actually tried to do this with different teams on multiple occasions and each time the I've tried, certain limitations in the Atom Publishing Protocol become quite obvious when you get outside of blog editing scenarios for which the protocol was originally designed. For this reason, we will likely standardize on a different RESTful protocol which I'll discuss in a later post. However I thought it would be useful to describe the limitations we saw in the Atom Publishing Protocol which made it unsuitable as the data access protocol for a large class of online services. 

Overview of the Atom's Data Model

The Atom data model consists of collections, entry resources and media resources. Entry resources and media resources are member resources of a collection. There is a handy drawings in section 4.2 of the latest APP draft specification that shows the hierarchy in this data model which is reproduced below.

                    
Member Resources
|
-----------------
| |
Entry Resources Media Resources
|
Media Link Entry

A media resource can have representations in any media type. An entry resource corresponds to an atom:entry element which means it must have an id, a title, an updated date, one or more authors and textual content. Below is a minimal atom:entry element taken from the Atom 1.0 specification


<entry>
    <title>Atom-Powered Robots Run Amok</title>
    <link href="http://example.org/2003/12/13/atom03"/>
    <id>urn:uuid:1225c695-cfb8-4ebb-aaaa-80da344efa6a</id>
    <updated>2003-12-13T18:30:02Z</updated>
    <summary>Some text.</summary>
</entry>

The process of creating and editing resources is covered in section 9 of the current APP draft specification. To add members to a Collection, clients send POST requests to the URI of the Collection. To delete a Member Resource, clients send a DELETE request to its Member URI. While to edit a Member Resource, clients send PUT requests to its Member URI.

Since using PUT to edit a resource is obviously problematic, the specification notes two concerns that developers have to pay attention to when updating resources.

To avoid unintentional loss of data when editing Member Entries or Media Link Entries, Atom Protocol clients SHOULD preserve all metadata that has not been intentionally modified, including unknown foreign markup.
...
Implementers are advised to pay attention to cache controls, and to make use of the mechanisms available in HTTP when editing Resources, in particular entity-tags as outlined in [NOTE-detect-lost-update]. Clients are not assured to receive the most recent representations of Collection Members using GET if the server is authorizing intermediaries to cache them.

The [NOTE-detect-lost-update] points to Editing the Web: Detecting the Lost Update Problem Using Unreserved Checkout which not only talks about ETags but also talks about conflict resolution strategies when faced with multiple edits to a Web document. This information is quite relevant to anyone considering implementing the Atom Publishing Protocol or a similar data manipulation protocol.   

With this foundation, we can now talk about the various problems one faces when trying to use the Atom Publishing Protocol with certain types of Web data stores.

Limitations Caused by the Constraints within the Atom Data Model

The following is a list of problems one faces when trying to utilize the Atom Publishing Protocol in areas outside of content publishing for which it was originally designed.

  1. Mismatch with data models that aren't microcontent: The Atom data model fits very well for representing authored content or microcontent on the Web such as blog posts, lists of links, podcasts, online photo albums and calendar events. In each of these cases the requirement that each Atom entry has an an id, a title, an updated date, one or more authors and textual content can be met and actually makes a lot of sense. On the other hand, there are other kinds online data that don't really fit this model.

    Below is an example of the results one could get from invoking the users.getInfo method in the Facebook REST API.

    
    <user>
        <uid>8055</uid>
        <about_me>This field perpetuates the glorification of the ego.  Also, it has a character limit.</about_me>
        <activities>Here: facebook, etc. There: Glee Club, a capella, teaching.</activities>    
        <birthday>November 3</birthday>
        <books>The Brothers K, GEB, Ken Wilber, Zen and the Art, Fitzgerald, The Emporer's New Mind, The Wonderful Story of Henry Sugar</books>
        <current_location>
    <city>Palo Alto</city>
    <state>CA</state>
    <country>United States</country>
    <zip>94303</zip>
    </current_location>

    <first_name>Dave</first_name>
    <interests>coffee, computers, the funny, architecture, code breaking,snowboarding, philosophy, soccer, talking to strangers</interests>
    <last_name>Fetterman</last_name>
    <movies>Tommy Boy, Billy Madison, Fight Club, Dirty Work, Meet the Parents, My Blue Heaven, Office Space </movies>
    <music>New Found Glory, Daft Punk, Weezer, The Crystal Method, Rage, the KLF, Green Day, Live, Coldplay, Panic at the Disco, Family Force 5</music>
    <name>Dave Fetterman</name>
    <profile_update_time>1170414620</profile_update_time>
    <relationship_status>In a Relationship</relationship_status>
    <religion/>
    <sex>male</sex>
    <significant_other_id xsi:nil="true"/>
    <status>
    <message>Pirates of the Carribean was an awful movie!!!</message> </status> </user>

    How exactly would one map this to an Atom entry? Most of the elements that constitute an Atom entry don't make much sense when representing a Facebook user. Secondly, one would have to create a large number of proprietary extension elements to anotate the atom:entry element to hold all the Facebook specific fields for the user. It's like trying to fit a square peg in a round hole. If you force it hard enough, you can make it fit but it will look damned ugly.

    Even after doing that, it is extremely unlikely that an unmodified Atom feed reader or editing client such as would be able to do anything useful with this Frankenstein atom:entry element. If you are going to roll your own libraries and clients to deal with this Frankenstein element, then it it begs the question of what benefit you are getting from mis using a standardized protocol in this manner?

    I guess we could keep the existing XML format used by the Facebook REST API and treat the user documents as media resources. But in that case, we aren't really using the Atom Publishing Protocol, instead we've reinvented WebDAV. Poorly.

  2. Lack of support for granular updates to fields of an item: As mentioned in the previous section editing an entry requires replacing the old entry with a new one. The expected client interaction with the server is described in section 5.4 of the current APP draft and is excerpted below.

    Retrieving a Resource

    Client                                     Server
    | |
    | 1.) GET to Member URI |
    |------------------------------------------>|
    | |
    | 2.) 200 Ok |
    | Member Representation |
    |<------------------------------------------|
    | |
    1. The client sends a GET request to the URI of a Member Resource to retrieve its representation.
    2. The server responds with the representation of the Member Resource.

    Editing a Resource

    Client                                     Server
    | |
    | 1.) PUT to Member URI |
    | Member Representation |
    |------------------------------------------>|
    | |
    | 2.) 200 OK |
    |<------------------------------------------|
    1. The client sends a PUT request to store a representation of a Member Resource.
    2. If the request is successful, the server responds with a status code of 200.

    Can anyone spot what's wrong with this interaction? The first problem is a minor one that may prove problematic in certain cases. The problem is pointed out in the note in the documentation on Updating posts on Google Blogger via GData which states

    IMPORTANT! To ensure forward compatibility, be sure that when you POST an updated entry you preserve all the XML that was present when you retrieved the entry from Blogger. Otherwise, when we implement new stuff and include <new-awesome-feature> elements in the feed, your client won't return them and your users will miss out! The Google data API client libraries all handle this correctly, so if you're using one of the libraries you're all set.

    Thus each client is responsible for ensuring that it doesn't lose any XML that was in the original atom:entry element it downloaded. The second problem is more serious and should be of concern to anyone who's read Editing the Web: Detecting the Lost Update Problem Using Unreserved Checkout. The problem is that there is data loss if the entry has changed between the time the client downloaded it and when it tries to PUT its changes.

    Even if the client does a HEAD request and compares ETags just before PUTing its changes, there's always the possibility of a race condition where an update occurs after the HEAD request. After a certain point, it is probably reasonable to just go with "most recent update wins" which is the simplest conflict resolution algorithm in existence. Unfortunately, this approach fails because the Atom Publishing Protocol makes client applications responsible for all the content within the atom:entry even if they are only interested in one field.

    Let's go back to the Facebook example above. Having an API now makes it quite likely that users will have multiple applications editing their data at once and sometimes these aplications will change their data without direct user intervention. For example, imagine Dave Fetterman has just moved to New York city and is updating his data across various services. So he updates his status message in his favorite IM client to "I've moved" then goes to Facebook to update his current location. However, he's installed a plugin that synchronizes his IM status message with his Facebook status message. So the IM plugin downloads the atom:entry that represents Dave Fetterman, Dave then updates his address on Facebook and right afterwards the IM plugin uploads his profile information with the old location and his new status message. The IM plugin is now responsible for data loss in a field it doesn't even operate on directly.

  3. Poor support for hierarchy: The Atom data model is that it doesn't directly support nesting or hierarchies. You can have a collection of media resources or entry resources but the entry resources cannot themselves contain entry resources. This means if you want to represent an item that has children they must be referenced via a link instead of included inline. This makes sense when you consider the blog syndication and blog editing background of Atom since it isn't a good idea to include all comments to a post directly children of an item in the feed or when editing the post. On the other hand, when you have a direct parent<->child hierarchical relationship, where the child is an addressable resource in its own right, it is cumbersome for clients to always have to make two or more calls to get all the data they need.

UPDATE: Bill de hÓra responds to these issues in his post APP on the Web has failed: miserably, utterly, and completely and points out to two more problems that developers may encounter while implementing GData/APP.