I had hoped to avoid talking about RESTful Web services for a couple of weeks but Yaron Goland's latest blog post APP and Dare, the sitting duck deserves a mention.  In his post, Yaron  talks concretely about some of the thinking that has gone on at Windows Live and other parts of Microsoft around building a RESTful protocol for accessing and manipulating data stores on the Web. He writes

I'll try to explain what's actually going on at Live. I know what's going on because my job for little over the last year has been to work with Live groups designing our platform strategy.
...
Most of the services in Live land follow a very similar design pattern, what I originally called S3C which stood for Structured data, with some kind of Schema (in the general sense, I don't mean XML Schema), with some kind of Search and usually manipulated with operations that look rather CRUD like. So it seemed fairly natural to figure out how to unify access to those services with a single protocol.
...
So with this in mind we first went to APP. It's the hottest thing around. Yahoo, Google, etc. everyone loves it. And as Dare pointed out in his last article Microsoft has adopted it and will continue to adopt it where it makes sense. There was only one problem - we couldn't make APP work in any sane way for our scenarios. In fact, after looking around for a bit, we couldn't find any protocol that really did what we needed. Because my boss hated the name S3C we renamed the spec Web3S and that's the name we published it under. The very first section of the spec explains our requirements. I also published a FAQ that explains the design rationale for Web3S. And sure enough, the very first question, 2.1, explains why we didn't use ATOM.
...
Why not just modify APP?
We considered this option but the changes needed to make APP work for our scenarios were so fundamental that it wasn't clear if the resulting protocol would still be APP. The core of ATOM is the feed/entry model. But that model is what causes us our problems. If we change the data model are we still dealing with the same protocol? I also have to admit that I was deathly afraid of the political implications of Microsoft messing around with APP. I suspect Mr. Bray's comments would be taken as a pleasant walk in the park compared to the kind of pummeling Microsoft would receive if it touched one hair on APP's head.

In his post, Yaron talks about two of the key limitations we saw with the Atom Publishing Protocol (i.e. lack of support for hierarchies and lack of support for granular updates to fields) and responds to the various suggestions about how one can workaround these problems in APP. As he states in the conclusion of his post we are very wary of suggestions to "embrace and extend" Web standards given the amount of negative press the company has gotten about that over the years. It seems better for the industry if we build a protocol that works for our needs and publish documentation about how it works so any interested party can interoperate with us than if we claim we support a Web standard when in truth it only "works with Microsoft" because it has been extended in incompatible ways.

Dealing with Hierarchy

Here's what Yaron had to say with regards to the discussion around APP's lack of explicit support for hierarchies

The idea that you put a link in the ATOM feed to the actual object. This isn't a bad idea if the goal was to publish notices about data. E.g. if I wanted to have a feed that published information about changes to my address book then having a link to the actual address book data in the APP entries is fine and dandy. But if the goal is to directly manipulate the address book's contents then having to first download the feed, pull out the URLs for the entries and then retrieve each and every one of those URLs in separate requests in order to pull together all the address book data is unacceptable from both an implementation simplicity and performance perspective. We need a way where by someone can get all the content in the address book at once. Also, each of our contacts, for example, are actually quite large trees. So the problem recurses. We need a way to get all the data in one contact at a go without having to necessarily pull down the entire address book. At the next level we need a way to get all the phone numbers for a single contact without having to download the entire contact and so on.

Yaron is really calling out two issues here. The first is that if you have a data type that doesn't map well as a piece of authored content then it is better represented as its own content type that is linked from an atom:entry than trying to treat an atom:entry with its requirement of author, summary and title fields as a good way to represent all types of data. The second issue is the lack of explicit support for hierarchies. This situation is an example of how something that seems entirely reasonable in one scenario can be problematic in another. If you are editing blog posts, it probably isn't that much of a burden to first retrieve an Atom feed of all your recent blog posts, locate the link to the one you want to edit then retrieve it for editing. In addition, since a blog post is authored content, the most relevant information about the post can be summarized in the atom:entry. On the other hand, if you want to retrieve your list of IM buddies so you can view their online status or get people in your friend's list to see their recent status updates, it isn't pretty to fetch a feed of your contacts then have to retrieve each contact one by one after locating the links to their representations in the Atom feed. Secondly, you may just want to address part of the data instead of instead of retrieving or accessing an entire user object if you just want their status message or online status.

Below are specification excerpts showing how two RESTful protocols from Microsoft address these issues.

How Web3S Does It

The naming of elements and level of hierarchy in an XML document that is accessible via Web3S can be arbitrarily complex as long as it satisfies some structural constraints as specified in The Web3S Resource Infoset. The constraints include no mixed content and that multiple instances of an element with the same name as children of a node must be identified by a Web3S:ID element (e.g. multiple entries under a feed are identified by ID). Thus the representation of a Facebook user returned by the users.getInfo method in the Facebook REST API should be a valid Web3S document [except that the concentration element would have to be changed from having string content to having two element children, a Web3S:ID that can be used to address each concentration directly and another containing the current textual content].

The most important part of being able to properly represent hierarchies is that different levels of the hierarchy can be directly accessed. From the Web3S documentation section entitled Addressing Web3S Information Items in HTTP

In order to enable maximum flexibility Element Information Items (EIIs) are directly exposed as HTTP resources. That is, each EII can be addressed as a HTTP resource and manipulated with the usual methods...


<articles>
 <article>
  <Web3S:ID>8383</Web3S:ID>
  <title>Manual of Surgery Volume First: General Surgery. Sixth Edition.</title>
  <authors>
   <author>
    <Web3S:ID>23455</Web3S:ID>
    <firstname>Alexander</firstname>
    <lastname>Miles</lastname>    
   </author>
   <author>
    <Web3S:ID>88828</Web3S:ID>
    <firstname>Alexis</firstname>
    <lastname>Thomson</lastname>    
   </author>
  </authors>
 </article>
</articles>

If the non-Web3S prefix path is http://example.net/stuff/morestuff then we could address the lastname EII in Alexander Miles’s entry as http://example.net/stuff/morestuff/net.examples.articles/net.example.article(8383)/net.example.authors/net.example.author(23455)/org.example.lastname.

 Although String Information Items (SIIs) are modeled as resources they currently do not have their own URLs and therefore are addressed only in the context of EIIs. E.g. the value of an SII would be set by setting the value of its parent EII.

XML heads may balk at requiring IDs to differentiate elements with the same name at the same scope or level of hierarchy instead of using positional indexes like XPath does. The problem with is that assumes that the XML document order is significant in the underlying data store which may likely not be the case.

Supporting Granular Updates

Here's what Yaron had to say on the topic of supporting granular updates and the various suggestions that came up with regards to preventing the lost update problem in APP.

APP's approach to this problem is to have the client download all the content, change the stuff they understand and then upload all the content including stuff they don't understand.
...
On a practical level though the 'download then upload what you don't understand' approach is complicated. To make it work at all one has to use optimistic concurrency. For example, let's say I just want to change the first name of a contact and I want to use last update wins semantics. E.g. I don't want to use optimistic concurrency. But when I download the contact I get a first name and a last name. I don't care about the last name. I just want to change the first name. But since I don't have merge semantics I am forced to upload the entire record including both first name and last name. If someone changed the last name on the contact after I downloaded but before I uploaded I don't want to lose that change since I only want to change the first name. So I am forced to get an etag and then do an if-match and if the if-match fails then I have to download again and try again with a new etag. Besides creating race conditions I have to take on a whole bunch of extra complexity when all I wanted in the first place was just to do a 'last update wins' update of the first name.
...
A number of folks seem to agree that merge makes sense but they suggested that instead of using PUT we should use PATCH. Currently we use PUT with a specific content type (application/Web3S+xml). If you execute a PUT against a Web3S resources with that specific content-type then we will interpret the content using merge semantics. In other words by default PUT has replacement semantics unless you use our specific content-type on a Web3S resource. Should we use PATCH? I don't think so but I'm flexible on the topic.

This is one place where a number of APP experts such as Bill de hÓra and James Snell seem to agree that the current semantics in APP are insufficient. There also seems to be some consensus that it is too early to standardize a technology for partial updates of XML on the Web without lots more implementation experience. I also agree with that sentiment. So having it out of APP for now probably isn't a bad thing.

Currently I'm still torn on whether Web3S's use of PUT for submitting partial updates is kosher or whether it is more appropriate to invent  a new HTTP method called PATCH. There was a thread about this on the rest-discuss mailing list and for the most part it seems people felt that applying merge semantics on PUT requests for a specific media type is valid if the server understands that those are the semantics of that type. 

How Web3S Does It

From the Web3S documentation section entitled Application/Web3S+xml with Merge Semantics

On its own the Application/Web3S+xml content type is used to represent a Web3S infoset. But the semantics of that infoset can change depending on what method it is used with.

In the case of PUT the semantics of the Application/Web3S+xml request body are “merge the infoset information in the Application/Web3S+xml request with the infoset of the EII identified in the request-URI.” This section defines how Application/Web3S+xml is to be handled specifically in the case of PUT or any other context in which the Web3S infoset in the Application/Web3S+xml serialization is to be merged with some existing Web3S infoset.

For example, imagine that the source contains:

 <whatever>
  <Web3S:ID>234</Web3S:ID>
  <yo>
   <Web3S:ID>efghi</Web3S:ID>
   <avalue />
   <somethingElse>YO!!!</somethingElse>
  </yo>
 </whatever>
Now imagine that the destination, before the merge, contains:

 <whatever>
  <nobodyhome />
 </whatever> 
In this example the only successful outcome of the merge would have to be:

 <whatever>
  <Web3S:ID>234</Web3S:ID>
  <yo>
   <Web3S:ID>efghi</Web3S:ID>
   <avalue />
   <somethingElse>YO!!!</somethingElse>
  </yo>
  <nobodyhome />
 </whatever>
In other words, not only would all of the source’s contents have to be copied over but the full names (E.g. EII names and IDs) must also be copied over exactly.

This an early draft of the spec so there are a lot of rules that aren't explicitly spelled out but now you should get the gist of how Web3S works. If you have any questions, direct them to Yaron not to me. I'm just an interested observer when it comes to Web3S. Yaron is the person to talk to if you want to make things happen. :)

In a couple of days I'll take a look at how Project Astoria deals with the same problems in a curiously similar fashion. Until then you can make do with Leonard Richardson's excellent review of Project Astoria. Until next time.


 

Saturday, June 16, 2007 9:06:06 PM (GMT Daylight Time, UTC+01:00)
When you investigated how to support hierarchical structures within Atom did you consider whether it might be possible to consolidate a distributed Atom hierarchy (Atom entries referencing external Atom feeds to support lists of lists) using the hAtom microformat? I believe that you can create an Atom entry that contains hAtom content and that each of these hAtom payloads can in turn support hAtom content within each of its entries. Might this type of solution provide the type of recursion that your scenarios call for?

I hope that MS is not still under the illusion that they have the juice required to successfully introduce yet another competitor to RSS and Atom. It appears that the good will which MS established from the introduction of the SSE and SLE extensions has worn as thin as the level of support MS has so far provided to the community to promote these formats.
scott
Saturday, June 16, 2007 9:13:33 PM (GMT Daylight Time, UTC+01:00)
scott,
I fail to see how Web3S is a competitor to the RSS or Atom syndication formats which are both well supported by all Microsoft feed reading applications.

As for your other question, it's already been answered enough times within the above post to make it redundant to continue to acknowledge it.
Sunday, June 17, 2007 1:43:35 AM (GMT Daylight Time, UTC+01:00)
"XML heads may balk at requiring IDs to differentiate elements with the same name at the same scope or level of hierarchy instead of using positional indexes like XPath does. The problem with is that assumes that the XML document order is significant in the underlying data store which may likely not be the case."

I can't imagine what kind of wrong-headed thinking would lead to someone balking at that (though I'll defer to your far greater experience in dealing with users of XML libraries), but I can imagine plenty of concern about this one: why aren't they URIs?
Sunday, June 17, 2007 5:38:28 AM (GMT Daylight Time, UTC+01:00)
"it seems people felt that applying merge semantics on PUT requests for a specific media type is valid if the server understands that those are the semantics of that type."

That is a very mealy-mouthed way of saying the meaning of the message is dependent on the server implementation. Sound RESTful to you?
Robert Sayre
Sunday, June 17, 2007 6:01:06 AM (GMT Daylight Time, UTC+01:00)
Robert Sayre,
It can be declared that the merge semantics are expected from the server when PUTing resources of a certain media type. However the problem is that this puts an onus on servers to treat that media type specially which likely won't be the case with your default WebDAV information in IIS or Apache, for example.

This is why I tend to prefer using an explicit PATCH method instead of PUT. However adding a new HTTP method for this scenario also has its issues.
Sunday, June 17, 2007 11:48:42 AM (GMT Daylight Time, UTC+01:00)
Good to see Microsoft thinking RESTful and making Live datastores (contacts,spaces etc) accessible via APP. Bad that HTTP Query gets no mention in your post is that because its GData i.e APP + HTTP Query and from Google?

Wouldn't "Query" over come the issue of having to make a full HTTP GET request? This allows granular query of resources and has an Atom Update method.

The ugly is that Microsoft still have this only invented in Seattle mentality just as Google is fast getting the same myopic focus in Mountain View. Why can't you both participate with Tim Bray to get APP 2.0 to adopt your extensions - if they have merit?

Making a new MIME type, a new HTTP method and is bad enough. This project has a sinlge aim to make Live Contacts accessible via HTTP UPDATE but it is not a universal solution. Why not include this work in MS Astoria?

I would also kill off Web3S. As the FAQ says at the beginning "Web3S is a means, not an end and we will happily abandon it if we can find a consensus protocol that meets our needs."

I hope you can find the consensus ...

Sunday, June 17, 2007 8:54:33 PM (GMT Daylight Time, UTC+01:00)
Sam,
I try to separate search from query. Search is "find the string 'georgia tech' in any one or more of the text properties of my friends' while query is more "find all users who have 'georgia tech' in their [college] property and graduated between 2000 and 2002'.

With GData, Google describes specific URI parameters for passing search or query parameters and uses OpenSearch elements in the results for indicating number of search results, number of pages of results, etc.

I admit I haven't thought much about generalized search and my thoughts on query lean towards using URL based addressing similar to XPath or what Astoria has done. The way GData handles query seems very specific to their situation with regards to URI parameters but the use of OpenSearch seems generally applicable. I'll chat with folks at work about this topic when I get to work this week.

The Astoria folks and the Web3S are talking. It's still to early to tell if there'll be consensus and what form it will take. I personally hope that we see more consolidation but only time will tell.
Sunday, June 17, 2007 9:05:21 PM (GMT Daylight Time, UTC+01:00)
Date,

PATCH wouldn't be a "new" HTTP method. It is defined in RFC2068. What's needed is an updated RFC for it, along with media type registrations for some useful patch formats, minimally the thing produced by "diff", and potentially one format allowing seek/copy/truncate/write inside binary objects.

Best regards, Julian
Julian Reschke
Monday, June 18, 2007 10:37:50 PM (GMT Daylight Time, UTC+01:00)
What about cases where the same author writes more than one article?

That one "real" author would show up as two EII elements with different IDs and different URIs.

I'm still trying to understand this, and haven't read all of the materials.

My latest blog entries are exploring this, and I hope I'm wrong in my analysis.

If I'm right, then Web3S has gained hierarchy by thowing away identity. Not a good trade on the web.
Monday, June 18, 2007 10:57:37 PM (GMT Daylight Time, UTC+01:00)
Speaking just for myself and not my employer, I think Web3S is crap.

Some parts of it make sense. These happen to be the obvious parts that the WebData team (and many others) solved years ago.

All the other parts of it are just "huh?". I mean, parentheses instead of brackets - cool but who cares? UPDATE is unnecessary -- it's just there to solve the "problem" of intermediate states that are schema-invalid. UpdateGrams went through the same pain back in '99. Web3S URLs are horrible. As John pointed out, there are problems with identity.

Yaron's a smart guy, and he's identified and overcome some deficiencies in APP, but I think Web3S is overly complex and has no future.

In constrast, Astoria is much simpler, and has some interesting ideas behind it. I'm glad he's reached out to their team, after I suggested it to him a few weeks ago. At the time, Yaron indicated he had not had any contact with them.

So if Adam Bosworth was on to all this stuff 10 years ago and it's just now taking off, does that mean that healthcare's going to be the hotness 10 years from now?
Friday, June 22, 2007 5:07:55 PM (GMT Daylight Time, UTC+01:00)
On 5/3/2007 I publicly stated (http://forums.microsoft.com/MSDN/ShowPost.aspx?PostID=1553855&SiteID=1) that we were coordinating Web3S with Astoria. Michael Brundage and I met on 5/21/2007. I can't imagine why I would have denied on 5/21 in private what I said on 5/3 in public. But if I miscommunicated Web3S's relationship with Astoria to Michael I apologize, it was not my intent.

To be clear - Astoria and Web3S are unifying our systems and have been working hard on this since May.

I am glad Michael likes Astoria and of course sad that he thinks Web3S is crap. But I take Michael's admiration of Astoria as a complement to Web3S because one of the outcomes of the numerous Astoria/Web3S coordination meetings is the discovery that the Astoria and Web3S protocol models are essentially identical. It has made combining the systems very easy. So anyone who likes Astoria by definition likes Web3S.
Comments are closed.