Recently I wrote a blog post entitled Google
GData: A Uniform Web API for All Google Services where I pointed out that
Google has standardized on GData (i.e. Google's implementation of the Atom 1.0 syndication format and the Atom Publishing Protocol with some extensions) as the data access protocol for Google's services
going forward. In a comment to that post
Gregor Rothfuss wondered whether I
couldn't influence people at Microsoft to also standardize on GData. The fact is that I've actually tried to do this with different teams on multiple occasions and each time the I've tried, certain limitations in the Atom Publishing Protocol become quite obvious when you get outside of blog editing scenarios for which the protocol was originally designed. For this reason, we will likely standardize on a different RESTful protocol which I'll discuss in a later post. However I thought it would be useful to describe the limitations we saw in the Atom Publishing Protocol which made it unsuitable as the data access protocol for a large class of online services.
Overview of the Atom's Data Model
The Atom data model consists of collections, entry resources and media resources. Entry resources and media resources are member resources of a collection. There is a handy drawings in section 4.2 of the latest APP draft specification that shows the hierarchy in this data model which is reproduced below.
Entry Resources Media Resources
Media Link Entry
A media resource can have representations in any media type. An entry
resource corresponds to an
which means it must have an id, a title, an updated date, one or more authors and textual content.
Below is a minimal
atom:entry element taken from the Atom 1.0 specification
<title>Atom-Powered Robots Run Amok</title>
The process of creating and editing resources is covered in section 9 of the current APP draft specification. To add members to a
Collection, clients send POST requests to the URI of the Collection. To delete
a Member Resource, clients send a DELETE request to its Member URI. While to
edit a Member Resource, clients send PUT requests to its Member URI.
PUT to edit a resource is obviously problematic, the specification notes two
concerns that developers have to pay attention to when updating resources.
To avoid unintentional loss of data when editing Member Entries or Media
Link Entries, Atom Protocol clients SHOULD preserve all metadata that has not
been intentionally modified, including unknown foreign markup.
Implementers are advised to pay attention to cache controls, and to make use of the mechanisms available in HTTP when editing Resources, in particular entity-tags as outlined in [NOTE-detect-lost-update]. Clients are not assured to receive the most recent representations of Collection Members using GET if the server is authorizing intermediaries to cache them.
The [NOTE-detect-lost-update] points to
Editing the Web: Detecting the
Lost Update Problem Using Unreserved Checkout which not only talks about
ETags but also talks about conflict resolution strategies when faced with
multiple edits to a Web document. This information is quite relevant to anyone considering implementing the Atom Publishing Protocol or a similar data manipulation protocol.
With this foundation, we can now talk about the various problems one faces when
trying to use the Atom Publishing Protocol with certain types of Web
Limitations Caused by the Constraints within the Atom Data Model
The following is a list of problems one faces when trying to utilize the Atom Publishing Protocol in areas outside of content publishing for which it was originally designed.
Mismatch with data models that aren't microcontent: The Atom
data model fits very well for representing authored content or
microcontent on the Web
such as blog posts, lists of links, podcasts, online photo albums and calendar
events. In each of these cases the requirement that each Atom entry has an
an id, a title, an updated date, one or more authors and textual content can be
met and actually makes a lot of sense. On the other hand, there are other
kinds online data that don't really fit this model.
Below is an example of the results one could get from invoking the
users.getInfo method in the Facebook REST API.
<about_me>This field perpetuates the glorification of the ego. Also, it has a character limit.</about_me>
<activities>Here: facebook, etc. There: Glee Club, a capella, teaching.</activities>
<books>The Brothers K, GEB, Ken Wilber, Zen and the Art, Fitzgerald, The Emporer's New Mind, The Wonderful Story of Henry Sugar</books>
<interests>coffee, computers, the funny, architecture, code breaking,snowboarding, philosophy, soccer, talking to strangers</interests>
<movies>Tommy Boy, Billy Madison, Fight Club, Dirty Work, Meet the Parents, My Blue Heaven, Office Space </movies>
<music>New Found Glory, Daft Punk, Weezer, The Crystal Method, Rage, the KLF, Green Day, Live, Coldplay, Panic at the Disco, Family Force 5</music>
<relationship_status>In a Relationship</relationship_status>
<message>Pirates of the Carribean was an awful movie!!!</message>
How exactly would one map this to an Atom entry? Most of the elements that
constitute an Atom entry don't make much sense when representing a
Facebook user. Secondly, one would have
to create a large number of proprietary extension elements to anotate the
atom:entry element to hold all the Facebook specific fields for the user. It's like trying to fit a square
peg in a round hole. If you force it hard enough, you can make it fit but it
will look damned ugly.
Even after doing that, it is extremely unlikely that an unmodified Atom feed
reader or editing client such as would be able to do anything useful with
atom:entry element. If you are going to roll
your own libraries and clients to deal with this Frankenstein element, then it
it begs the question of what benefit you are getting from
using a standardized protocol in this manner?
I guess we could keep the existing XML format used by the Facebook REST API and treat the user documents as media
resources. But in that case, we aren't really using the Atom Publishing Protocol,
instead we've reinvented
Lack of support for granular updates to fields of an item:
As mentioned in the previous section editing an entry requires replacing the old
entry with a new one. The expected client interaction with the server is described
in section 5.4 of the current APP draft and is excerpted below.
Retrieving a Resource
| 1.) GET to Member URI |
| 2.) 200 Ok |
| Member Representation |
- The client sends a GET request to the URI of a Member Resource to retrieve its representation.
- The server responds with the representation of the Member Resource.
Editing a Resource
| 1.) PUT to Member URI |
| Member Representation |
| 2.) 200 OK |
- The client sends a PUT request to store a representation of a Member Resource.
- If the request is successful, the server responds with a status code of 200.
Can anyone spot what's wrong with this interaction? The first problem is a
minor one that may prove problematic in certain cases. The problem is pointed out in
the note in the documentation on Updating posts on Google Blogger via GData
IMPORTANT! To ensure forward compatibility, be sure that when you
POST an updated entry you preserve all the XML that was present when you retrieved the entry from Blogger. Otherwise, when we implement new stuff and include
<new-awesome-feature> elements in the feed, your client won't return them and your users will miss out! The Google data API client libraries all handle this correctly, so if you're using one of the libraries you're all set.
Thus each client is responsible for ensuring that it doesn't lose any XML that
was in the original
atom:entry element it downloaded.
The second problem is more serious and should be of concern to anyone who's read
Editing the Web: Detecting the
Lost Update Problem Using Unreserved Checkout. The problem is that there is
data loss if the entry has changed between the time the client downloaded it
and when it tries to PUT its changes.
Even if the client does a HEAD request and compares ETags just before PUTing its
changes, there's always the possibility of a race condition where an update
occurs after the HEAD request. After a certain point, it is probably reasonable to
just go with "most recent update wins" which is the simplest conflict resolution
algorithm in existence. Unfortunately, this approach fails because the
Atom Publishing Protocol makes client applications responsible for all the content
atom:entry even if they are only interested in one field.
Let's go back to the Facebook example above.
Having an API now makes it quite likely that users will have multiple applications
editing their data at once and sometimes these aplications will change their data
without direct user intervention. For example, imagine Dave Fetterman has just moved
to New York city and is updating his data across various services. So he updates his
status message in his favorite IM client to "I've moved" then goes to
Facebook to update his current location.
However, he's installed a plugin that synchronizes his IM status message with his
Facebook status message. So the IM plugin
atom:entry that represents Dave Fetterman, Dave then
updates his address on Facebook and right
afterwards the IM plugin uploads his profile information with the old location
and his new status message. The IM plugin is now responsible for data loss in
a field it doesn't even operate on directly.
Poor support for hierarchy: The Atom data model
is that it doesn't directly support nesting or hierarchies. You can have a collection
of media resources or entry resources but the entry resources cannot themselves
contain entry resources. This means if you want to represent an item that has
children they must be referenced via a link instead of included inline. This makes
sense when you consider the blog syndication and blog editing background of Atom
since it isn't a good idea to include all comments to a post directly children of
an item in the feed or when editing the post. On the other hand, when you have a
direct parent<->child hierarchical relationship, where the child is an
addressable resource in its own right, it is cumbersome for clients to always have
to make two or more calls to get all the data they need.
UPDATE: Bill de hÓra responds to these issues in his post APP on the Web has failed: miserably, utterly, and completely and points out to two more problems that developers may encounter while implementing GData/APP.