More Thoughts on an HTTP PATCH and AtomPub

February 23, 2008

@ 04:00 AM

Sam Ruby has an insightful response to Joe Gregorio in his post APP Level Patch where he writes

Joe Gregorio: At Google we are considering using PATCH. One of the big open questions surrounding that decision is XML patch formats. What have you found for patch formats and associated libraries?

I believe that looking for an XML patch format is looking for a solution at the wrong meta level. Two examples, using AtomPub:

In Atom, the order of elements in an entry is not significant. AtomPub servers often do not store their data in XML serialized form, or even in DOM form. If you PUT an entry, and then send a PATCH based on the original serialization, it may not be understood.
A lot of data in this world is either not in XML, or if it is in XML, is simply there via tunneling. Atom elements are often merely thin wrappers around HTML. HTML has a DOM, and can flattened into a sequence of SAX like events, just like XML can be.

I totally agree with Sam. A generic “XML patch format” is totally the wrong solution. At Microsoft we had several different XML patch formats produced by the same organization because each targetted a different scenario

Diffgram: Represent a relational database table and changes to it as XML.
UpdateGram: Represent changes to an XML view of one or more relational database tables optionally including a mapping from relational <-> XML data
Patchgram: Represent infoset level differences between two XML documents

Of course, these are one line sumarries but you get the point. Depending on your constraints, you’ll end up with a different set of requirements. Quick test, tell me why one would choose Patchgrams over XUpdate and vice versa?

Given the broad set of constraints that will exist in different server implementations of the Atom Publishing Protocol, a generic XML patch format will have lots of features which just don’t make sense (e.g. XUpdate can create processing instructions, Patchgrams use document ordered positions of nodes for matching).

If you decide you really need a patch format for Atom documents, your best bet is working with the community to define one or more which are specific to the unique constraints of the Atom syndication format instead of hoping that there is a generic XML patch format out there you can shoehorn into a solution. In the words of Joe Gregorio’s former co-worker, “I make it fit!”.

Personally, I think you’ll still end up with so many different requirements (Atom stores backed by actual text documents will have different concerns from those backed by relational databases) and spottiness in supporting the capability that you are best off just walking away from this problem by fixing your data model. As I said before, if you have sub-resources which you think should be individually editable then give them a URI and make them resources as well complete with their own atom:entry element.

Now playing: Oomp Camp - Time To Throw A Chair

Categories: XML Web Services

« How "View Source" Broke the Web | Home | Slashdotters on Google's Foray Into Heal... »

Saturday, 23 February 2008 16:45:16 (GMT Standard Time, UTC+00:00)

I was going to be amused by how often there is this desire to break the abstraction that XML is and attempt to do something to it as if it was a binary format. Then I realized that it is not funny and, more than that, it explains some quirkiness about the understanding that many of us have about data formats and their abstractions and thinking that XML and a serialization of XML are the same thing.

This goes back to the misguided effort, at one point, to put a resource patch into WebDAV, to apply digital signatures and encryption to the binary form of an XML data stream, etc.

The way this was solved for DSIG (and which makes people crazy) is that you need a canonical form of XML (which has a predictable binary form) and sign / encrypt that.

For patching it would seem that there must also be canonicalization and then there needs to be some way to deal with location of a part (XPATH I suppose?). And then the service that hosts the resource-to-be-patched must figure out how to actually work this with the storage form that is actually being used? Of course, if you want to patch at finer grain than an element (or some sequence of elements) or attribute, I am afraid to look at what might be available for that, especially if we are talking about content elements that are XML-safe encodings of some other format. And then there are comments and processing instructions and ...

I have a sneaking suspicion that an Atom-only case might not be enough simpler than the generic XML case, depending ...

orcmid

Saturday, 23 February 2008 17:31:08 (GMT Standard Time, UTC+00:00)

orcmid,
As I mentioned the difference between regular XML and Atom requires different solutions. XPath would be a fine way to identify nodes for generic XML but not in Atom since XPath is positional (i.e. paths assume document order is significant) while this is explicitly not the case in AtomPub. This is why Joe had to go with misusing xml:id in his original proposal and both Web3S and Astoria require an ID for each element.

Dare Obasanjo

Sunday, 24 February 2008 16:55:14 (GMT Standard Time, UTC+00:00)

Got it. And now, another country is heard from:
http://www.oreillynet.com/xml/blog/2008/02/addressing_fragments_in_rest_1.html
Oh my oh my. Yup, fragment IDs. Urk.

orcmid

Comments are closed.

Dare Obasanjo's weblog

"You can buy cars but you can't buy respect in the hood" - Curtis Jackson

Navigation for More Thoughts on an HTTP PATCH and AtomPub - Dare Obasanjo's weblog