I read Marc Andreessen's Analyzing the Facebook Platform, three weeks in and although there's a lot to agree with I was also confused by how he defined a platform in the Web era. After a while, it occurred to me that Marc Andreessen's Ning is part of an increased interest by a number of Web players in chasing after what I like to call the the GoDaddy 2.0 business model. Specifically, a surprisingly disparate group of companies seem to think that the business of providing people with domain names and free Web hosting so they can create their own "Web 2.0" service is interesting. None of these companies have actually come out and said it but whenever I think of Ning or Amazon's S3+EC2, I see a bunch of services that seem to be diving backwards into the Web hosting business dominated by companies like GoDaddy.

Reading Marc's post, I realized that I didn't think of the facilities that a Web hosting provider gives you as "a platform". When I think of a Web platform, I think of an existing online service that enables developers to either harness the capabilities of that service or access its users in a way that allows the developers add value to user experience. The Facebook platform is definitely in this category.  On the other hand, the building blocks that it takes to actually build a successful online service including servers, bandwidth and software building blocks (LAMP/RoR/etc) can also be considered a platform. This is where Ning and Amazon's S3+EC2. With that context, let's look at the parts of Marc's Analyzing the Facebook Platform, three weeks in which moved me to write this. Marc writes 

Third, there are three very powerful potential aspects of being a platform in the web era that Facebook does not embrace.

The first is that Facebook itself is not reprogrammable -- Facebook's own code and functionality remains closed and proprietary. You can layer new code and functionality on top of what Facebook's own programmers have built, but you cannot change the Facebook system itself at any level.

There doesn't seem to be anything fundamentally wrong with this. When you look at some of the most popular platforms in the history of the software industry such as Microsoft Windows, Microsoft Office, Mozilla Firefox or Java you notice that none of them allow applications built on them to fundamentally change what they are. This isn't the goal of an application platform. More specifically, when one considers platforms that were already applications such as Microsoft Office or Mozilla Firefox it is clear that the level of extensibility allowed is intended to allow improving the user experience while utilizing the application and thus make the platform more sticky as opposed to reprogramming the core functionality of the application.

The second is that all third-party code that uses the Facebook APIs has to run on third-party servers -- servers that you, as the developer provide. On the one hand, this is obviously fair and reasonable, given the value that Facebook developers are getting. On the other hand, this is a much higher hurdle for development than if code could be uploaded and run directly within the Facebook environment -- on the Facebook servers.

This is one unfortunate aspect of Web development which tends to harm hobbyists. Although it is possible for me to find the tools to create and distribute desktop applications for little or no cost, the same cannot be said about Web software. At the very least I need a public Web server and the ability to pay for the hosting bills if my service gets popular. This is one of the reasons I can create and distribute RSS Bandit to thousands of users as a part time project with no cost to me except my time but cannot say the same if I wanted to build something like Google Reader.

This is a significant barrier to adoption of certain Web platforms which is a deal breaker for many developers who potentially could add a lot of value. Unfortunately, building an infrastructure that allows you to run arbitrary code from random Web developers and gives these untrusted applications database access without harming your core service costs more in time and resources than most Web companies can afford. For now.

The third is that you cannot create your own world -- your own social network -- using the Facebook platform. You cannot build another Facebook with it.

See my response to his first point. The primary reason for the existence of the Facebook platform  is to harness the creativity and resources of outside developers in benefiting the social networks within Facebook. Allowing third party applications to fracture this social network or build competing services doesn't benefit the Facebook application. What Facebook offers developers is access to an audience of engaged users and in exchange these developers make Facebook a more compelling service by building cool applications on it. That way everybody wins.

An application that takes off on Facebook is very quickly adopted by hundreds of thousands, and then millions -- in days! -- and then ultimately tens of millions of users.

Unless you're already operating your own systems at Facebook levels of scale, your servers will promptly explode from all the traffic and you will shortly be sending out an email like this.
...
The implication is, in my view, quite clear -- the Facebook Platform is primarily for use by either big companies, or venture-backed startups with the funding and capability to handle the slightly insane scale requirements. Individual developers are going to have a very hard time taking advantage of it in useful ways.

I think Marc is over blowing the problem here, if one can even call it a problem. A fundamental truth of building Web applications is that if your service is popular then you will eventually hit scale problems. This was happening last century during "Web 1.0" when eBay outages were regular headlines and website owners used to fear the Slashdot effect. Until the nature of the Internet is fundamentally changed, this will always be the case.

However none of this means you can't build a Web application unless you have VC money or are big company. Instead, you should just have a strategy for how to deal with keeping your servers up and running if you service becomes a massive hit with users.  It's a good problem to have but one needs to remember that most Web applications will never have that problem. ;)

When you develop a new Facebook application, you submit it to the directory and someone at Facebook Inc. approves it -- or not.

If your application is not approved for any reason -- or if it's just taking too long -- you apparently have the option of letting your application go out "underground". This means that you need to start your application's proliferation some other way than listing it in the directory -- by promoting it somewhere else on the web, or getting your friends to use it.

But then it can apparently proliferate virally across Facebook just like an approved application.

I think the viral distribution model is probably one of the biggest innovations in the Facebook platform. Announcing to my friends whenever I install a new application so that they can try them out themselves is pretty killer. This feature probably needs to be fine tuned I don't end up recommending or being recommending bad apps like X Me but that is such a minor nitpick. This is potentially a game changing move in the world of software distribution. I mean, can you imagine if you got a notification whenever one of your friends discovered a useful Firefox add-on or a great Sidebar gadget? It definitely beats using TechCrunch or Download.com as your source of cool new apps.


 

Categories: Platforms | Social Software

Doc Searls has a blog post entitled Questions Du Jour where he writes

Dare Obasanjo: Why Facebook is Bigger than Blogging. In response to Kent Newsome's request for an explanation of why Facebook is so cool.

While I think Dare makes some good points, his headline (which differs somewaht from the case he makes in text) reads to me like "why phones are better than books". The logic required here is AND, not OR. Both are good, for their own reasons.

Unlike phones and books, however, neither blogging nor Facebook are the final forms of the basics they offer today.

I think Doc has mischaracterized my post on why social networking services have seen broader adoption than blogging. My post wasn't about which is better since such a statement is as unhelpful as saying a banana is better than an apple. A banana is a better source of potassium but a worse source of dietary fiber. Which is better depends on what metric you are interested in.

My post was about popularity and explaining why, in my opinion, more people create and update their profiles on social networks than write blogs. 


 

Categories: Social Software

Every once in a while someone asks me about software companies to work for in the Seattle area that aren't Microsoft, Amazon or Google. This is the fourth in a series of weekly posts about startups in the Seattle area that I often mention to people when they ask me this question.

Zillow is a real-estate Web site that is slowly revolutionizing how people approach the home buying experience. The service caters to buyers, sellers, potential sellers and real estate professionals in the following ways

  1. For buyers: You can research a house and find out it's vital statistics (e.g. number of bedrooms, square footage, etc) it's current estimated value and how much it sold for when it was last sold. In addition, you can scope out homes that were recently sold in the neighborhood and get a good visual representation of the housing market in a particular area.
  2. For sellers and agents: Homes for sale can be listed on the service
  3. Potential sellers: You can post a Make Me Move™ price without having to actually list your home for sale.

I used Zillow when as part of the home buying process when I got my place and I think the service is fantastic. They also have the right level of buzz given recent high level poachings of Google employees and various profiles in the financial press.

The company was founded by Lloyd Frink and Richard Barton who are ex-Microsoft folks whose previous venture was Expedia, another Web site that revolutionized how people approached a common task.

Press: Fortune on Zillow

Number of Employees: 133

Location: Seattle, WA (Downtown)

Jobs: careers@zillow.hrmdirect.com, current open positions include a number of Software Development Engineer and Software Development Engineer in Test positions as well as a Systems Engineer and Program Manager position.


 

I had hoped to avoid talking about RESTful Web services for a couple of weeks but Yaron Goland's latest blog post APP and Dare, the sitting duck deserves a mention.  In his post, Yaron  talks concretely about some of the thinking that has gone on at Windows Live and other parts of Microsoft around building a RESTful protocol for accessing and manipulating data stores on the Web. He writes

I'll try to explain what's actually going on at Live. I know what's going on because my job for little over the last year has been to work with Live groups designing our platform strategy.
...
Most of the services in Live land follow a very similar design pattern, what I originally called S3C which stood for Structured data, with some kind of Schema (in the general sense, I don't mean XML Schema), with some kind of Search and usually manipulated with operations that look rather CRUD like. So it seemed fairly natural to figure out how to unify access to those services with a single protocol.
...
So with this in mind we first went to APP. It's the hottest thing around. Yahoo, Google, etc. everyone loves it. And as Dare pointed out in his last article Microsoft has adopted it and will continue to adopt it where it makes sense. There was only one problem - we couldn't make APP work in any sane way for our scenarios. In fact, after looking around for a bit, we couldn't find any protocol that really did what we needed. Because my boss hated the name S3C we renamed the spec Web3S and that's the name we published it under. The very first section of the spec explains our requirements. I also published a FAQ that explains the design rationale for Web3S. And sure enough, the very first question, 2.1, explains why we didn't use ATOM.
...
Why not just modify APP?
We considered this option but the changes needed to make APP work for our scenarios were so fundamental that it wasn't clear if the resulting protocol would still be APP. The core of ATOM is the feed/entry model. But that model is what causes us our problems. If we change the data model are we still dealing with the same protocol? I also have to admit that I was deathly afraid of the political implications of Microsoft messing around with APP. I suspect Mr. Bray's comments would be taken as a pleasant walk in the park compared to the kind of pummeling Microsoft would receive if it touched one hair on APP's head.

In his post, Yaron talks about two of the key limitations we saw with the Atom Publishing Protocol (i.e. lack of support for hierarchies and lack of support for granular updates to fields) and responds to the various suggestions about how one can workaround these problems in APP. As he states in the conclusion of his post we are very wary of suggestions to "embrace and extend" Web standards given the amount of negative press the company has gotten about that over the years. It seems better for the industry if we build a protocol that works for our needs and publish documentation about how it works so any interested party can interoperate with us than if we claim we support a Web standard when in truth it only "works with Microsoft" because it has been extended in incompatible ways.

Dealing with Hierarchy

Here's what Yaron had to say with regards to the discussion around APP's lack of explicit support for hierarchies

The idea that you put a link in the ATOM feed to the actual object. This isn't a bad idea if the goal was to publish notices about data. E.g. if I wanted to have a feed that published information about changes to my address book then having a link to the actual address book data in the APP entries is fine and dandy. But if the goal is to directly manipulate the address book's contents then having to first download the feed, pull out the URLs for the entries and then retrieve each and every one of those URLs in separate requests in order to pull together all the address book data is unacceptable from both an implementation simplicity and performance perspective. We need a way where by someone can get all the content in the address book at once. Also, each of our contacts, for example, are actually quite large trees. So the problem recurses. We need a way to get all the data in one contact at a go without having to necessarily pull down the entire address book. At the next level we need a way to get all the phone numbers for a single contact without having to download the entire contact and so on.

Yaron is really calling out two issues here. The first is that if you have a data type that doesn't map well as a piece of authored content then it is better represented as its own content type that is linked from an atom:entry than trying to treat an atom:entry with its requirement of author, summary and title fields as a good way to represent all types of data. The second issue is the lack of explicit support for hierarchies. This situation is an example of how something that seems entirely reasonable in one scenario can be problematic in another. If you are editing blog posts, it probably isn't that much of a burden to first retrieve an Atom feed of all your recent blog posts, locate the link to the one you want to edit then retrieve it for editing. In addition, since a blog post is authored content, the most relevant information about the post can be summarized in the atom:entry. On the other hand, if you want to retrieve your list of IM buddies so you can view their online status or get people in your friend's list to see their recent status updates, it isn't pretty to fetch a feed of your contacts then have to retrieve each contact one by one after locating the links to their representations in the Atom feed. Secondly, you may just want to address part of the data instead of instead of retrieving or accessing an entire user object if you just want their status message or online status.

Below are specification excerpts showing how two RESTful protocols from Microsoft address these issues.

How Web3S Does It

The naming of elements and level of hierarchy in an XML document that is accessible via Web3S can be arbitrarily complex as long as it satisfies some structural constraints as specified in The Web3S Resource Infoset. The constraints include no mixed content and that multiple instances of an element with the same name as children of a node must be identified by a Web3S:ID element (e.g. multiple entries under a feed are identified by ID). Thus the representation of a Facebook user returned by the users.getInfo method in the Facebook REST API should be a valid Web3S document [except that the concentration element would have to be changed from having string content to having two element children, a Web3S:ID that can be used to address each concentration directly and another containing the current textual content].

The most important part of being able to properly represent hierarchies is that different levels of the hierarchy can be directly accessed. From the Web3S documentation section entitled Addressing Web3S Information Items in HTTP

In order to enable maximum flexibility Element Information Items (EIIs) are directly exposed as HTTP resources. That is, each EII can be addressed as a HTTP resource and manipulated with the usual methods...


<articles>
 <article>
  <Web3S:ID>8383</Web3S:ID>
  <title>Manual of Surgery Volume First: General Surgery. Sixth Edition.</title>
  <authors>
   <author>
    <Web3S:ID>23455</Web3S:ID>
    <firstname>Alexander</firstname>
    <lastname>Miles</lastname>    
   </author>
   <author>
    <Web3S:ID>88828</Web3S:ID>
    <firstname>Alexis</firstname>
    <lastname>Thomson</lastname>    
   </author>
  </authors>
 </article>
</articles>

If the non-Web3S prefix path is http://example.net/stuff/morestuff then we could address the lastname EII in Alexander Miles’s entry as http://example.net/stuff/morestuff/net.examples.articles/net.example.article(8383)/net.example.authors/net.example.author(23455)/org.example.lastname.

 Although String Information Items (SIIs) are modeled as resources they currently do not have their own URLs and therefore are addressed only in the context of EIIs. E.g. the value of an SII would be set by setting the value of its parent EII.

XML heads may balk at requiring IDs to differentiate elements with the same name at the same scope or level of hierarchy instead of using positional indexes like XPath does. The problem with is that assumes that the XML document order is significant in the underlying data store which may likely not be the case.

Supporting Granular Updates

Here's what Yaron had to say on the topic of supporting granular updates and the various suggestions that came up with regards to preventing the lost update problem in APP.

APP's approach to this problem is to have the client download all the content, change the stuff they understand and then upload all the content including stuff they don't understand.
...
On a practical level though the 'download then upload what you don't understand' approach is complicated. To make it work at all one has to use optimistic concurrency. For example, let's say I just want to change the first name of a contact and I want to use last update wins semantics. E.g. I don't want to use optimistic concurrency. But when I download the contact I get a first name and a last name. I don't care about the last name. I just want to change the first name. But since I don't have merge semantics I am forced to upload the entire record including both first name and last name. If someone changed the last name on the contact after I downloaded but before I uploaded I don't want to lose that change since I only want to change the first name. So I am forced to get an etag and then do an if-match and if the if-match fails then I have to download again and try again with a new etag. Besides creating race conditions I have to take on a whole bunch of extra complexity when all I wanted in the first place was just to do a 'last update wins' update of the first name.
...
A number of folks seem to agree that merge makes sense but they suggested that instead of using PUT we should use PATCH. Currently we use PUT with a specific content type (application/Web3S+xml). If you execute a PUT against a Web3S resources with that specific content-type then we will interpret the content using merge semantics. In other words by default PUT has replacement semantics unless you use our specific content-type on a Web3S resource. Should we use PATCH? I don't think so but I'm flexible on the topic.

This is one place where a number of APP experts such as Bill de hÓra and James Snell seem to agree that the current semantics in APP are insufficient. There also seems to be some consensus that it is too early to standardize a technology for partial updates of XML on the Web without lots more implementation experience. I also agree with that sentiment. So having it out of APP for now probably isn't a bad thing.

Currently I'm still torn on whether Web3S's use of PUT for submitting partial updates is kosher or whether it is more appropriate to invent  a new HTTP method called PATCH. There was a thread about this on the rest-discuss mailing list and for the most part it seems people felt that applying merge semantics on PUT requests for a specific media type is valid if the server understands that those are the semantics of that type. 

How Web3S Does It

From the Web3S documentation section entitled Application/Web3S+xml with Merge Semantics

On its own the Application/Web3S+xml content type is used to represent a Web3S infoset. But the semantics of that infoset can change depending on what method it is used with.

In the case of PUT the semantics of the Application/Web3S+xml request body are “merge the infoset information in the Application/Web3S+xml request with the infoset of the EII identified in the request-URI.” This section defines how Application/Web3S+xml is to be handled specifically in the case of PUT or any other context in which the Web3S infoset in the Application/Web3S+xml serialization is to be merged with some existing Web3S infoset.

For example, imagine that the source contains:

 <whatever>
  <Web3S:ID>234</Web3S:ID>
  <yo>
   <Web3S:ID>efghi</Web3S:ID>
   <avalue />
   <somethingElse>YO!!!</somethingElse>
  </yo>
 </whatever>
Now imagine that the destination, before the merge, contains:

 <whatever>
  <nobodyhome />
 </whatever> 
In this example the only successful outcome of the merge would have to be:

 <whatever>
  <Web3S:ID>234</Web3S:ID>
  <yo>
   <Web3S:ID>efghi</Web3S:ID>
   <avalue />
   <somethingElse>YO!!!</somethingElse>
  </yo>
  <nobodyhome />
 </whatever>
In other words, not only would all of the source’s contents have to be copied over but the full names (E.g. EII names and IDs) must also be copied over exactly.

This an early draft of the spec so there are a lot of rules that aren't explicitly spelled out but now you should get the gist of how Web3S works. If you have any questions, direct them to Yaron not to me. I'm just an interested observer when it comes to Web3S. Yaron is the person to talk to if you want to make things happen. :)

In a couple of days I'll take a look at how Project Astoria deals with the same problems in a curiously similar fashion. Until then you can make do with Leonard Richardson's excellent review of Project Astoria. Until next time.


 

Categories: Windows Live | XML Web Services

We are about ready to ship the second release of RSS Bandit this year. This release will be codenamed ShadowCat. This version will primarily be about bug fixes with one or two minor convenience features thrown in. If you are an RSS Bandit user who's had problems with crashes or other instability in the application, please grab the installer at RssBandit.1.5.0.14.ShadowCat.Beta.zip and let us know if this improves your user experience. 

New Features

  • Newspaper view can be configured to show unread items in a feed as multiple pages instead of a single page of items

Major Bug Fixes

  • Javascript errors on Web pages result in error dialogs being displayed or the application hanging.
  • Search indexing thread takes 100% of CPU and freezes the computer.
  • Crashes related to Lucene search indexing (e.g. IO exceptions, access violations, file in use errors, etc)
  • Crash when a feed has an IRI (i.e. URL with unicode characters such as http://www.bücher-forum.de/forum/rdf.php) instead of issuing an error message since they are not supported.
  • Context menus no longer work on category nodes after OPML import or remote sync
  • Crash on deleting a feed which still has enclosures being downloaded
  • Podcasts downloaded from the http://poweruser.tv/ feed are named "..mp3" instead of the actual file name.
  • Items marked as read in a search folder not marked as read in original feed
  • No news shown when subscribed to newsgroups.borland.com
  • Can't subscribe to feeds on your local hard drive (regression since it worked in previous versions)

After the final release of ShadowCat, the following release of RSS Bandit (codenamed Phoenix) will contain our next major user interface revamp and will likely be the version where we move to version 2.0 of the .NET Framework.You can find some of our early screen shots on Flickr.


 

Categories: RSS Bandit

  1. Bigger disappointment: Clerks 2 or Idiocracy?
  2. Why is Carrot Top so buff?
  3. Best way to introduce my fiancée to the greatness of giant shape shifting robots: Transformers: The Movie (1986) or Transformers: The Movie (2007)?
  4. Was this article meant to be a joke?
  5. Best way to complete this sentence without the help of a search engine: Pinky, are you pondering what I'm pondering? I think so, Brain but...?

 

Categories: Ramblings

Omar Shahine has a blog post entitled Hotmail + Outlook = Sweet where he writes

At long last... experience Hotmail inside of Outlook.

What used to be a subscription only offering is now available to anyone that wants it. While Outlook used to have the ability to talk to Hotmail via DAV it was flaky and 2 years ago we no longer offered it to new users of the service.

Well the new Outlook Connector has a few notable features that you didn't get with the old DAV support:

  1. uses DeltaSync, a Microsoft developed HTTP based protocol that sync's data based on change sequence numbers. This means that the server is stateful about the client. Whenever the client connects to Hotmail, the server tells the clients of any changes that happened since the last time the client connected. This is super efficient and allows us to offer the service to many users at substantially lower overhead than stateless protocols. This is the same protocol utilized by Windows Live Mail. It's similar in nature to exchange Cached Mode or AirSync, the mobile sync stack used by Windows Mobile Devices.
  2. Sync of Address Book. Your Messenger/Hotmail contacts get stored in Outlook.
  3. Sync of Calendar (currently premium only)
  4. Sync of allow/block lists for safety/spam

I've been using the Microsoft Office Outlook Connector for a few years now and have always preferred it to the Web interface for accessing my email on Hotmail. It's great that this functionality is now free for anyone who owns a copy of Microsoft Outlook instead of being a subscription service.

PS: Omar mentioning Hotmail and Microsoft Outlook's use WebDAV reminds me that there have been other times in recent memory when using RESTful Web protocols swept Microsoft. Without reading old MSDN articles like Communicating XML Data over the Web with WebDAV when Microsoft Office, Microsoft Exchange and Internet Information Services (IIS) it's easy to forget that Microsoft almost ended up standardizing on WebDAV as the primary protocol for reading and writing Microsoft data sources on the Web. Of course, then SOAP and WS-* happened. :)


 

Categories: Windows Live | XML Web Services

If you don't read Stevey Yegge's blog, you should. You can consider him to be the new school version of Joel Spolsky especially now that most of Joel's writing is about what's going on at Fog Creek software and random rants about applications he's using. However you should be warned that Stevey writes long posts full of metaphors which often border on allegory.

Consider his most recent post That Old MarshMallow Maze Spell which is an interesting read but full of obfuscation. I actually haven't finished it since it is rather longer than I tend to devote to a single blog post. I've been trying to track down summaries of the post and the best I've gotten so far are some comments about the post on reddit which seem to imply that the allegory is about being burned out due to some death march project at his current employer.

I'm as down with schadenfreude as the next guy but a death march project seems wildly contradictory to Stevey's previous post Good Agile, Bad Agile where he wrote

The basic idea behind project management is that you drive a project to completion. It's an overt process, a shepherding: by dint of leadership, and organization, and sheer force of will, you cause something to happen that wouldn't otherwise have happened on its own.
Project management comes in many flavors, from lightweight to heavyweight, but all flavors share the property that they are external forces acting on an organization.

At Google, projects launch because it's the least-energy state for the system.
...
Anyway, I claimed that launching projects is the natural state that Google's internal ecosystem tends towards, and it's because they pump so much energy into pointing people in that direction. All your needs are taken care of so that you can focus, and as I've described, there are lots of incentives for focusing on things that Google likes.

So launches become an emergent property of the system.

This eliminates the need for a bunch of standard project management ideas and methods: all the ones concerned with dealing with slackers, calling bluffs on estimates, forcing people to come to consensus on shared design issues, and so on. You don't need "war team meetings," and you don't need status reports. You don't need them because people are already incented to do the right things and to work together well.

So, did anyone else get anything else out of Stevey's post besides "even at Google we have death marches that suck the soul out of you"? After all, I just kinda assumed that came with the territory.


 

One of the accusations made by Tim Bray in his post I’ve Seen This Movie is that my recent  posts about the Atom Publishing Protocol are part of some sinister plot by Microsoft to not support it in our blogging clients. That's really ironic considering that Microsoft is probably the only company that has shipped two blogging clients that support APP.

Don't take my word for it. In his blog post entitled Microsoft is not sabotaging APP (probably) Joe Cheng of the Windows Live Writer team writes

  1. Microsoft has already shipped a general purpose APP client (Word 2007) and GData implementation (Windows Live Writer). These are the two main blogging tools that Microsoft has to offer, and while I can’t speak for the Word team, the Writer team is serious about supporting Atom going forward.
  2. These two clients also already post to most blogs, not just Spaces. In particular, Writer works hard to integrate seamlessly with even clearly buggy blog services. I don’t know anyone who works as hard as we do at this.
  3. ...
  4. Spaces may not support APP, but it does support MetaWeblog which Microsoft has a lot less influence over than APP (since MW is controlled by Dave Winer, not by an official standards body). Consider that many of its main competitors, including MySpace, Vox, and almost all overseas social networking sites, have poor or nonexistent support for any APIs.
The reasoning behind Windows Live Spaces supporting the MetaWeblog API and not the Atom Publishing Protocol are detailed in my blog posts What Blog Posting APIs are supported by MSN Spaces? and Update on Blog Posting APIs and MSN Spaces which I made over two years ago when we were going through the decision process for what the API story should be for Windows Live Spaces. For those who don't have time to read both posts, it basically came down to choosing a mature de facto standard (i.e. the MetaWeblog API) instead of (i) creating a proprietary protocol which better our needs or (ii) taking a bet on the Atom Publishing Protocol spec which was a moving target in 2004 and is still a moving target today in 2007.

I hope this clears up any questions about Microsoft and APP. I'm pretty much done talking about this particular topic for the next couple of weeks.

PS: You can download Windows Live Writer from here and you can buy Office 2007 wherever great software is sold.


 

Categories: Windows Live | XML Web Services

I recently posted a blog post entitled Why GData/APP Fails as a General Purpose Editing Protocol for the Web which pointed out some limitations in the Atom Publishing Protocol (APP) and Google's implementation of it in GData with regards to being a general purpose protocol for updating data stores on the Web. There were a lot of good responses to my post from developers knowledgeable about APP including the authors of the specification, Bill de hÓra and Joe Gregorio. Below are links to some of these responses

Joe Gregorio: In which we narrowly save Dare from inventing his own publishing protocol
Bill de hÓra: APP on the Web has failed: miserably, utterly, and completely
David Megginson: REST, the Lost Update Problem, and the Sneakernet Test
Bill de hÓra: Social networks, web publishing and strategy tax
James Snell: Silly

There was also a post by Tim Bray entitled So Lame which questions my motives for writing the post and implies that it is some sinister plot by Microsoft to make sure that we use proprietary technologies to lock users in. I guess I should have given more background in my previous post. The fact is that lots of people have embraced building RESTful Web Services in a big way. My primary concern now is that we don't end up seeing umpteen different RESTful protocols from Microsoft [thus confusing our users and ourselves] and instead standardize on one or two. For example, right now we already have Atom+SSE, Web3S and Project Astoria as three completely different RESTful approaches for updating or retrieving data from a Microsoft data source on the Web. In my mind, that's two too many and that's just the stuff we've made public so there could be more. I'd personally like to see us reduce the set of RESTful protocols coming out of Microsoft to one and even better end up reusing existing Web standards, if possible. Of course, this is an aspiration and it is quite possible that all of these protocols are different for a reason (e.g. we have FTP, SMTP, and HTTP which all can be used to transfer files but have very different use cases) and there is no hope for unification let alone picking some existing standard. My previous post  was intended to point out the limitations I and others had noticed with using the Atom Publishing Protocol (APP) as a general protocol for updating data stores that didn't primarily consist of authored content. The point of the post was to share these learnings with other developers working in this space and get feedback from the general developer community just in case there was something wrong with my conclusions.

Anyway, back to the title of this post. In my previous post I pointed out to the following limitations of APP as a general purpose protocol for editing Web content

  1. Mismatch with data models that aren't microcontent
  2. Lack of support for granular updates to fields of an item
  3. Poor support for hierarchy

I have to admit that a lot of my analysis was done on GData because I assumed incorrectly that it is a superset of the Atom Publishing Protocol. After a closer reading of the fifteenth most recent draft APP specification spurred by the responses to my post by various members of the Atom community it seems clear that the approaches chosen by Google in GData run counter to the recommendations of Atom experts including both authors of the spec.

For problem #1, the consensus from Atom experts was that instead of trying to map a distinct concept such as a Facebook user to an Atom entry complete with a long list of proprietary extensions to the atom:entry element, one should instead create a specific data format for that type then treat it as a distinct media type that is linked from atom:entry. Thus in the Facebook example from my previous post, one would have a distinct user.xml file and a corresponding atom:entry which linked to it for each user of the system. Contrast this with the use of the gd:ContactSection in an atom:entry for representing a user. It also seems that the GData solution to the problem of what to put in the elements such as atom:author and atom:summary which are required by the specification but make no sense outside of content/microcontent editing scenarios is to omit them. It isn't spec compliant but I guess it is easier than putting in nonsensical values to satisfy some notion of a valid feed.

For problem #2, a number of folks pointed out that conditional PUT requests using ETags and the If-Match header are actually in the spec. This was my oversight since I skipped the section since the title "Caching and Entity Tags" didn't imply that it had anything to do with dealing with the lost update problem. I actually haven't found a production implementation of APP that supports conditional PUTs this shouldn't be hard to implement for services that require this functionality. This definitely makes the lost update problem more tractable. However a model where a client can just say "update the user's status message to X" still seems more straightforward than one where the client says "get the entire user element", "update the user's status message to X on the client", "replace the user on the server with my version of the user", and potentially "there is a version mismatch so merge my version of the user with the most recent version of the user from the server and try again". The mechanism GData uses for solving the lost update problem is available in the documentation topic on Optimistic concurrency (versioning). Instead of using ETags and If-Match, GData appends a version number to the URL to which the client publishes the updated atom:entry and then cries foul if the client publishes to a URL with an old version number. I guess you could consider this a different implementation of conditional PUTs from what is recommended in the most recent version of the APP draft spec.

For problem #3, the consensus seemed to be to use a atom:link elements to show hierarchy similar to what has been done in Atom threading extensions. I don't question the value of linking and think this is a fine approach for the most part. However, the fact is that in certain scenarios [especially high traffic ones] it is better for the client to be able to make requests like "give me the email with message ID 6789 and all the replies in that thread" than "give me all the emails and I'll figure out the hierarchy I'm interested in myself by piecing together link relationships". I notice that GData completely punts on representing hierarchy in the MessageKind construct which is intended for use in representing email messages.

Anyway I've learned my lesson and will treat the Atom Publishing Protocol (APP) and GData as separate protocols instead of using them interchangeably in the future.


 

Categories: XML Web Services