Joe Gregorio, one of the editors of RFC 5023: The Atom Publishing Protocol, has declared it a failure in his blog post titled The Atom Publishing Protocol is a failure where he writes

The Atom Publishing Protocol is a failure. Now that I've met by blogging-hyperbole-quotient for the day let's talk about standards, protocols, and technology… AtomPub isn't a failure, but it hasn't seen the level of adoption I had hoped to see at this point in its life.

Thick clients, RIAs, were supposed to be a much larger component of your online life. The cliche at the time was, "you can't run Word in a browser". Well, we know how well that's held up. I expect a similar lifetime for today's equivalent cliche, "you can't do photo editing in a browser". The reality is that more and more functionality is moving into the browser and that takes away one of the driving forces for an editing protocol.

Another motivation was the "Editing on the airplane" scenario. The idea was that you wouldn't always be online and when you were offline you couldn't use your browser. The part of this cliche that wasn't put down by Virgin Atlantic and Edge cards was finished off by Gears and DVCS's.

The last motivation was for a common interchange format. The idea was that with a common format you could build up libraries and make it easy to move information around. The 'problem' in this case is that a better format came along in the interim: JSON. JSON, born of Javascript, born of the browser, is the perfect 'data' interchange format, and here I am distinguishing between 'data' interchange and 'document' interchange. If all you want to do is get data from point A to B then JSON is a much easier format to generate and consume as it maps directly into data structures, as opposed to a document oriented format like Atom, which has to be mapped manually into data structures and that mapping will be different from library to library.

The Atom effort rose up around the set of scenarios related to blogging based applications at the turn of the decade; RSS readers and blog editing tools. The Atom syndication format was supposed to be a boon for RSS readers while the Atom Publishing Protocol was intended to make thing better for blog editing tools. There was also the expectation that the format and protocol were general enough that they would standardize all microcontent syndication and publishing scenarios.

The Atom syndication format has been as successful or perhaps even more successful than originally intended because it's original scenarios are still fairly relevant on today's Web. Reading blogs using feed readers like Google Reader, Outlook and RSS Bandit is still just as relevant today as it was six or seven years ago. Secondly, interesting new ways to consume feeds have sprung in the form of social aggregation via sites such as FriendFeed. Also since the Atom format is actually a generic format for syndicating microcontent, it has proved useful as new classes of microcontent have shown up on the Web such as streams of status updates and social network news feeds. Thanks to the Atom syndication format's extensibility it is being applied to these new scenarios in effective ways via community efforts such as and OpenSocial.

On the other hand, Joe is right that the Atom Publishing Protocol hasn't fared as well with the times. Today, editing a blog post via a Web-based blog editing tool like the edit box on a site like Blogger is a significantly richer experience than it was at the turn of the century. The addition of features such as automatically saving draft posts has also worked towards obviating some of the claimed benefits of desktop applications for editing blogs. For example, even though I use Windows Live Writer to edit my blog I haven't been able to convince my wife to switch to using it for editing her blog because she finds the Web "edit box" experience to be good enough. The double whammy comes from the fact that although new forms of microcontent have shown up which do encourage the existence of desktop tools (e.g. there are almost a hundred desktop Twitter apps and the list of Facebook desktop applications is growing rapidly), the services which provide these content types have shunned AtomPub and embraced JSON as the way to expose APIs for rich clients to interact with their content. The primary reason for this is that JSON works well as a protocol for both browser based client apps and desktop apps since it is more compatible with object oriented programming models and the browser security model versus an XML-based document-centric data format.

In my opinion, the growth in popularity of object-centric JSON over document-centric XML as the way to expose APIs on the Web has been the real stake in the heart for the Atom Publishing Protocol.


  • JSON vs. XML: Browser Programming Models – Dare Obasanjo on why JSON is more attractive than XML for Web-based mashups due to presenting a friendlier programming model for the browser. 

  • JSON vs. XML: Browser Security Model -   Dare Obasanjo on why JSON is more attractive than XML for Web-based mashups due to circumventing some of the browser security constraints placed on XML.

Note Now Playing: Lady Gaga - Poker Face Note


April 13, 2009
@ 12:14 AM

Note Now Playing: Jay-Z - A Dream Note


Categories: Personal

Todd Hoff over on the high scalability blog has an interesting post along the vein of my favorite mantra of Disk is the new tape titled Are Cloud Based Memory Architectures the Next Big Thing? where he writes

Why might cloud based memory architectures be the next big thing? For now we'll just address the memory based architecture part of the question, the cloud component is covered a little later.

Behold the power of keeping data in memory:

Google query results are now served in under an astonishingly fast 200ms, down from 1000ms in the olden days. The vast majority of this great performance improvement is due to holding indexes completely in memory. Thousands of machines process each query in order to make search results appear nearly instantaneously.

This text was adapted from notes on Google Fellow Jeff Dean keynote speech at WSDM 2009.

Google isn't the only one getting a performance bang from moving data into memory. Both LinkedIn and Digg keep the graph of their network social network in memory. Facebook has northwards of 800 memcached servers creating a reservoir of 28 terabytes of memory enabling a 99% cache hit rate. Even little guys can handle 100s of millions of events per day by using memory instead of disk.

The entire post is sort of confusing since it seems to mix ideas that should be two or three different blog posts into a single entry. Of the many ideas thrown around in the post, the one I find most interesting is highlighting the trend of treating in-memory storage as a core part of how a system functions not just as an optimization that keeps you from having to go to disk.

The LinkedIn architecture is a great example of this trend. They have servers which they call The Cloud whose job is to cache the site's entire social graph in memory and then have created multiple instances of this cached social graph. Going to disk to satisfy social graph related queries which can require touching data for hundreds to thousands of users is simply never an option. This is different from how you would traditionally treat a caching layer such as ASP.NET caching or typical usage of memcached.

To build such a memory based architecture there are a number of features you need to consider that don't come out of the box in caching product like memcached. The first is data redundancy which is unsupported in memcached. There are forty instances of LinkedIn's in-memory social graph which have to be kept mostly in sync without putting to much pressure on underlying databases. Another feature common to such memory based architectures that you won't find in memcached is transparent support for failover. When your data is spread out across multiple servers, losing a server should not mean that an entire server's worth of data is no longer being served out of the cache. This is especially of concern when you have a decent sized server cloud because it should be expected that servers come and go out of rotation all the time given hardware failures. Memcached users can solve this problem by using libraries that support consistent hashing (my preference) or by keeping a server available as a hot spare with the same IP address as the downed server. The key problem with lack of native failover support is that there is no way to automatically rebalance the workload on the pool of servers are added and removed from the cloud.

For large scale Web applications, if you're going to disk to serve data in your primary scenarios then you're probably doing something wrong. The path most scaling stories take is that people start with a database, partition it and then move to a heavily cached based approach when they start hitting the limits of a disk based system. This is now common knowledge in the pantheon of Web development. What doesn't yet seem to be part of the communal knowledge is the leap from being a cache-based with the option to fall back to a DB to simply being memory-based. The shift is slight but fundamental. 


  • Improving Running Components at Twitter: Evan Weaver's presentation on how Twitter improved their performance by two orders of magnitude by increasing their usage of caches and improving their caching policies among other changes. Favorite quote, "Everything runs from memory in Web 2.0".

  • Defining a data grid: A description of some of the features that an in-memory architecture needs beyond simply having data cached in memory.

Note Now Playing: Ludacris - Last Of A Dying Breed [Ft. Lil Wayne] Note


Categories: Web Development

Last week, Digg announced the launch of the DiggBar which manages to combine two trends that Web geeks can't stand. It is both a URL shortener (whose problems are captured in the excellent post by Joshua Schachter on URL shorteners) and brings back the trend of one website putting another's content in a frame (which has detailed criticism in the wikipedia article on framing on the World Wide Web). 

The increasing popularity of URL shortening services has been fueled by the growth of Twitter. Twitter has a 140 character limit on posts on the site which means users sharing links on the site often have to find some way of shortening URLs to make their content fit within the limit. From my perspective, this is really a problem that Twitter should fix given the amount of collateral damage the growth of these services may end up placing on the Web.

Some Web developers believe this problem can be solved by the judicious use of microformats. One such developer is Chris Shiflett who has written a post entitled Save the Internet with rev="canonical" which states the following

There's a new proposal ("URL shortening that doesn't hurt the Internet") floating around for using rev="canonical" to help put a stop to the URL-shortening madness. It sounds like a pretty good idea, and based on some discussions on IRC this morning, I think a more thorough explanation would be helpful. I'm going to try.

This is easiest to explain with an example. I have an article about CSRF located at the following URL:

I happen to think this URL is beautiful. :-) Unfortunately, it is sure to get mangled into some garbage URL if you try to talk about it on Twitter, because it's not very short. I really hate when that happens. What can I do?

If rev="canonical" gains momentum and support, I can offer my own short URL for people who need one. Perhaps I decide the following is an acceptable alternative:

Here are some clear advantages this URL has over any replacement:

  • The URL is mine. If it goes away, it's my fault. (Ma.gnolia reminds us of the potential for data loss when relying on third parties.)
  • The URL has meaning. Both the domain ( and the path (csrf) are meaningful.
  • Because the URL has meaning, visitors who click the link know where they're going.
  • I can search for links to my content; they're not hidden behind an indefinite number of short URLs.

There are other advantages, but these are the few I can think of quickly.

Let's try to walk through how this is expected to work. I type in a long URL like into Twitter. Twitter allows me to post the URL and then crawls the site to see if it has a link tag with a rev="canonical" attribute. It finds one and then replaces the short URL with something like which is the alternate short URL I've created for my talk. What could go wrong? Smile

So for this to solve the problem, every site that could potentially be linked to from Twitter (i.e. every website in the world) needs to run their own URL shortening service. Then Twitter needs to make sure to crawl the website behind every URL in every tweet that flows through the system.  Oh yeah, and the fact is that the URLs still aren't as efficient as those created by sites like unless everyone buys a short domain name as well.

Sounds like a lot of stars have to align to make this useful to the general populace and not just a hack that is implemented by a couple dozen web geeks.  

Note Now Playing: Nas - Hate Me Now (feat. Puff Daddy) Note


Categories: Web Development

Last week, Douglas Bowman posted a screed against making web design based strictly on usage data. In a post entitled Goodbye Google, he wrote

With every new design decision, critics cry foul. Without conviction, doubt creeps in. Instincts fail. “Is this the right move?” When a company is filled with engineers, it turns to engineering to solve problems. Reduce each decision to a simple logic problem. Remove all subjectivity and just look at the data. Data in your favor? Ok, launch it. Data shows negative effects? Back to the drawing board. And that data eventually becomes a crutch for every decision, paralyzing the company and preventing it from making any daring design decisions.

Yes, it’s true that a team at Google couldn’t decide between two blues, so they’re testing 41 shades between each blue to see which one performs better. I had a recent debate over whether a border should be 3, 4 or 5 pixels wide, and was asked to prove my case. I can’t operate in an environment like that. I’ve grown tired of debating such minuscule design decisions. There are more exciting design problems in this world to tackle.

I can’t fault Google for this reliance on data. And I can’t exactly point to financial failure or a shrinking number of users to prove it has done anything wrong. Billions of shareholder dollars are at stake. The company has millions of users around the world to please. That’s no easy task. Google has momentum, and its leadership found a path that works very well.

One thing I love about building web-based software is that there is the unique ability to try out different designs and test them in front of thousands to millions of users without incurring a massive cost. Experimentation practices such as A/B testing and Multivariate testing enable web designers to measure the impact of their designs on the usability of a site on actual users instead of having to resort to theoretical arguments about the quality of the design or waiting until after they've shipped to find out the new design is a mistake.

Experimentation is most useful when you have a clear goal or workflow the design is trying to achieve and you are worried that a design change may impact that goal. A great example of this is how shopping cart recommendations were shipped at Amazon which is recalled in a great story told by Greg Linden in his post Early Amazon: Shopping cart recommendations excerpted below

The idea of recommending items at checkout is nothing new. Grocery stories put candy and other impulse buys in the checkout lanes. Hardware stores put small tools and gadgets near the register. But here we had an opportunity to personalize impulse buys. It is as if the rack near the checkout lane peered into your grocery cart and magically rearranged the candy based on what you are buying.Health food in your cart? Let's bubble that organic dark chocolate bar to the top of the impulse buys. Steaks and soda? Get those snack-sized potato chip bags up there right away.

I hacked up a prototype. On a test site, I modified the shopping cart page to recommend other items you might enjoy adding to your cart. Looked pretty good to me. I started showing it around.While the reaction was positive, there was some concern. In particular, a marketing senior vice-president was dead set against it. His main objection was that it might distract people away from checking out -- it is true that it is much easier and more common to see customers abandon their cart at the register in online retail -- and he rallied others to his cause.

At this point, I was told I was forbidden to work on this any further. I was told Amazon was not ready to launch this feature. It should have stopped there. Instead, I prepared the feature for an online test. I believed in shopping cart recommendations. I wanted to measure the sales impact. I heard the SVP was angry when he discovered I was pushing out a test. But, even for top executives, it was hard to block a test. Measurement is good. The only good argument against testing would be that the negative impact might be so severe that Amazon couldn't afford it, a difficult claim to make. The test rolled out.

The results were clear. Not only did it win, but the feature won by such a wide margin that not having it live was costing Amazon a noticeable chunk of change. With new urgency, shopping cart recommendations launched.

This is a great example of using data to validate a design change instead of relying on gut feel. However one thing that is often overlooked is that the changes still have to be well-designed. Shopping cart recommendations feature on Amazon is designed in such a way that it doesn't break you out of the checkout flow. See below for a screenshot of the current shopping cart recommendation flow on Amazon

On the above page, it is always very clear how to complete the checkout AND the process of adding an item to the cart is a one click process that keeps you on the same page. Sadly, a lot of sites have tried to implement similar features but often end up causing users to abandon shopping carts because the design encourages users to break their flow as part of the checkout process.

One of the places experimentation falls down is when it is used to measure the impact of aesthetic changes to the site when these changes aren't part of a particular workflow (e.g. changing the company logo). Another problem with experimentation is that it may encourage focusing on metrics that are easily measurable to the detriment of other aspects of the user experience. For example, Google's famous holiday logos were a way to show of the fun, light-hearted aspect of their brand. Doing A/B testing on whether people do more searches with or without the holiday logos on the page would miss the point. Similarly, sometimes even if A/B testing does show that a design impacts particular workflows it sometimes is worth it if the message behind the design benefits the brand. For example, take this story from Valleywag "I'm feeling lucky" button costs Google $110 million per year

Google cofounder Sergey Brin told public radio's Marketplace that around one percent of all Google searches go through the "I'm Feeling Lucky" button. Because the button takes users directly to the top search result, Google doesn't get to show search ads on one percent of all its searches. That costs the company around $110 million in annual revenue, according to Rapt's Tom Chavez. So why does Google keep such a costly button around?

"It's possible to become too dry, too corporate, too much about making money. I think what's delightful about 'I'm Feeling Lucky' is that it reminds you there are real people here," Google exec Marissa Mayer explained


Last night, I stumbled on a design change in Twitter that I suspect wouldn't have been rolled out if it had gone through experimentation first. On the Twitter blog, Biz Stone writes Replies Are Now Mentions 

We're updating the Replies feature and referring to it instead as Mentions. In your Twitter sidebar you'll now see your own @username tab. When you click that tab, you'll see a list of all tweets referencing your account with the @username convention anywhere in the tweet—instead of only at the beginning which is how it used to work. So for me it would be all mentions of @biz. For developers, this update will also be included in the APIs.

Compare the above sidebar with the previous one below and which do you think will be more intuitive for new users to understand?

This would be a great candidate to test because the metric is straightforward; compare clicks on the replies tab by new users using the old version as the control and the new version as the test. Then again, maybe they did A/B test it which is why the "@username" text is used instead of "Mentions" which is even more unintuitive. :)

Note Now Playing: Jamie Foxx - Blame It (remix) (feat. T-Pain & Rosco) Note


Categories: Web Development

Over the past two weeks I participated in panels at both the SXSW and MIX 09 on the growing trend of provide streams of user activities on social sites and aggregating these activities from multiple services into a single experience. Aggregating activities from multiple sites into a single service for the purpose of creating a activity stream is fairly commonplace today and was popularized by Friendfeed. This functionality now exists on many social networking sites and related services including Facebook, Yahoo! Profile and the Windows Live Profile

In general, the model is to receive or retrieve user updates from a social media site like Flickr and make these updates available on the user's profile on the target social network and share it with the user's friends via an activity stream (or news feed) on the site. The diagram below attempts to capture this many-to-many relationship as it occurs today using some well known services as examples.

The bidirectional arrows are meant to indicate that the relationship can be push-based where the content-based social media site notifies the target social network of new updates from the user or pull-based where the social network polls the site on a regular basis seeking new updates from the target user.

There are two problems that sites have to deal with in this model

  1. Content sites like Flickr have to either deal with being polled unnecessarily millions of times a day by social networks seeking photo updates from their users.  There is the money quote from last year that FriendFeed polled Flickr 2.7 million times a day to retrieve a total of less than 7,000 updates. Even if they move to a publish-subscribe model it would mean not only having to track which users are of interest to which social network but also targeting APIs on different social networks that are radically different (aka the beautiful f-ing snowflake API problem).

  2. Social aggregation services like Friendfeed and Windows Live have to target dozens of sites each with a different APIs or schemas. Even in the case where the content sites support RSS or Atom, they often use radically different schemas for representing the same data

The approach I've been advocating along with others in the industry is that we need to adopt standards for activity streams in a way that reduces the complexity of this many-to-many conversation that is currently going on between social sites.

While I was at SXSW, I met one of the folks from Gnip who is advocating an alternate approach. He argued that even with activity stream standards we've only addressed part of the problem. Such standards may mean that FriendFeed gets to reuse their Flickr code to poll Smugmug with little to no changes but it doesn't change the fact that they poll these sites millions of times a day to get a few thousand updates.

Gnip has built a model where content sites publish updates to Gnip and then social networking sites can then choose to either poll Gnip or receive updates from Gnip when the update matches one of the rules they have created (e.g. notify us if you get a digg vote from Carnage4Life). The following diagram captures how Gnip works.

The benefit of this model to content sites like Flickr is that they no longer have to worry about being polled millions of times a day by social aggregation services. The benefit to social networking sites is that they now get a consistent format for data from the social media sites they care about and can choose to either pull the data or have it pushed to them.

The main problem I see with this model is that it sets Gnip up to be this central point of failure and I'd personally rather deal interact directly with the content services directly instead of inject a middle man into the process. However I can see how their approach would be attractive to many sites who might be buckling under the load of being constantly polled and to social aggregation sites that are tired of hand coding adapters for each new social media sites they want to integrate with. 

What do you think of Gnip's service and the problem space in general?

Note Now Playing: EamonF**k It (I Don't Want You Back) Note


On Thursday the 19th of March there was a panel on activity feeds like you find on Twitter & Facebook and social aggregation services like Friendfeed. The panelists were Kevin Marks (@kevinmarks) of Google, John McCrea (@johnmccrea) of Plaxo, Monica Keller (@ciberch) of MySpace, Luke Shepard (@lukeshepard) of Facebook and Marc Canter (@marccanter4real) of Broadband Mechanics. Yours truly was the moderator.

Although the turnout wasn't massive given it wasn't the run of the mill content for MIX 09, the audience was very engaged and we had almost 45 minutes of Q&A until we ran out of time. You can find the video here and also embedded below if you have Silverlight installed.

Get Microsoft Silverlight

Note Now Playing: Jodeci - My Heart Belongs to You Note


Categories: Social Software | Trip Report

Facebook's latest redesign which has been clearly inspired Twitter's real-time stream of status updates has had a ton of detractors from all corners. One of the biggest places where the outcry has centralized is the Facebook Layout vote application which currently has had over a million votes from Face book users with over 94% against the new changes and almost 600,000 comments, most of which seem to be negative if the hundred or so I read were a representative sample.

One thing I've wondered is how the folks at Facebook are taking this feedback. On the one hand, people don't like changes and the more disruptive the change the more they fight it. It's almost comical to go back and read the Time magazine article about the backlash against the news feed from back in 2006 given how much the feature has not only ended up defining Facebook but how significantly it has impacted the social software landscape at large.  On the other hand, sometimes people have a good reason to protest such as the outcry against the privacy destroying Facebook Beacon which eventually inspired a mea culpa from Zuckerburg himself.

Owen Thomas from Valleywag has an article entitled Even Facebook Employees Hate the Redesign which contains the following excerpt

The feedback on Facebook's new look, which emphasizes a stream of Twitter-like status updates, is almost universally, howlingly negative. Why isn't CEO Mark Zuckerberg listening to users? Because he doesn't have to, he's told employees.

A tipster tells us that Zuckerberg sent an email to Facebook staff reacting to criticism of the changes: "He said something like 'the most disruptive companies don't listen to their customers.'" Another tipster who has seen the email says Zuckerberg implied that companies were "stupid" for "listening to their customers." The anti-customer diktat has many Facebook employees up in arms, we hear.

When your application becomes an integral part of your customers lives and identities, it is almost expected that they protest any major changes to the user experience. The problem is that you may eventually become jaded about negative feedback because you assume that for the most part the protests are simply people's natural resistance to change.

I tend to agree that disruptive companies don't listen to their customers in the narrow sense of using them as a barometer to decide what products or features to build. Customer feature requests aren't the source of input that would spawn a Netflix in a world that had Blockbuster & Hollywood video. Such disruptive products are spawned from understanding the customers better than they understand themselves. If you had simply "listened" to Blockbuster's customers you'd think the best way to compete with them would be to have cheaper late fees or a bigger selection in your store. Netflix actually went a step further and understood the underlying customer problems (e.g. even going to a video store is a hassle which is why you end up with late fees in the first place) and created a product that was truly disruptive.

Using that model of "disruptive companies" the question then is whether the new Facebook is an example of understanding your customers better than they understand themselves or is truly a mistake? For my take on the answer I'll first point out a comment on Valleywag on the redesign

Here's the problem with the redesign. Twitter is a micro-blog. The 140 character Livejournal.

Facebook is not a blog. In its old form it was a really great PHONEBOOK. A phonebook that not only updated your acquaintance's (most FB friends are not friends) contact info, but also gave you a general summary of their life. It was a big picture kind of thing: Where they are, who they're dating, what school or job they have, and how to contact them. It was never about "sharing" your daily thoughts on how great your panini was or omg gossip girl is back! The livejournal twit-blog crap is messing up the phonebook interface.

This is the crux of the problem with the Facebook redesign. The expectations around how user relationships were created on Twitter are totally different from how they were on Facebook. On Twitter, users explicitly decide as part of following someone that they want all of the person's tweets in their stream. In fact, this is the only feature of the relationship on Twitter. On Facebook, you have relationships with people that attempt to mirror your real life so you have your boss, coworkers, school friends and acquaintances all trying to be part of your social graph because FB is really a kind of "rolodex" in the sky.

The fact that you got a news feed was kind of a side effect of filling out your virtual rolodex but it was cool because you got the highlights of what were going on in the lives of your friends and family. There is a legitimate problem that you weren't getting the full gist of everything your 120 contacts (average number of Facebook friends) were doing online but it would clearly lead to information overload to get up to the minute updates about the breakfast habits of some guy who sat next to you in middle school.

Somewhere along the line, it seems the folks at Facebook didn't internalize this fundamental difference in the social context that differentiates user to user relationships on Twitter versus Facebook. This to me is a big mistake.

Note Now Playing: Goodie Mob - They Don't Dance No Mo' Note


Categories: Social Software

March 18, 2009
@ 12:31 PM

I still dream about being back in the Nigerian military school which I attended two decades ago. Wow, two decades.

Wow, even trippier is that the school has a web site.

Note Now Playing: Supremes - Stop! In The Name Of Love Note


Categories: Personal

Our mystery panelist has been unveiled.

Standards for Aggregating Activity Feeds and Social Aggregation Services MIX09-T28F

Thursday March 19 |2:30 PM-3:45 PM | Lando 4201

By: Marc Canter, Monica Keller, Kevin Marks, John McCrea, Dare Obasanjo, Luke Shepard Tags: Services

Come hear a broad panel discussion about aggregating social feeds and services from leading people and companies in this rapidly evolving area including Dare Obasanjo from Microsoft as panel moderator, John McCrea from Plaxo, Kevin Marks from Google, Luke Shepard from Facebook, Marc Canter from Broadband Mechanics, and Monica Keller from MySpace.

You can read my previous post to learn more about what you can expect will be discussed at the panel.

Note Now Playing: Metallica - ...And Justice For All Note


Categories: Social Software | Trip Report