There was an interesting presentation at OSCON 2008 by Evan Henshaw-Plath and Kellan Elliott-McCrea entitled Beyond REST? Building Data Services with XMPP PubSub. The presentation is embedded below.

The core argument behind the presentation can be summarized by this tweet from Tim O'Reilly

On monday friendfeed polled flickr nearly 3 million times for 45000 users, only 6K of whom were logged in. Architectural mismatch. #oscon08

On July 21st, FriendFeed had 45,000 users who had associated their Flickr profiles with their FriendFeed account. FriendFeed polls Flickr about once every 20 – 30 minutes to see if the user has uploaded new pictures. However only about 6,000 of those users logged into Flickr that day, let alone uploaded pictures. Thus there were literally millions of HTTP requests made by FriendFeed that were totally unnecessary.

Evan and Kellan's talk suggests that instead of Flickr getting almost 3 million requests from FriendFeed, it would be a more efficient model for FriendFeed to tell Flickr which users they are interested in and then listen for updates from Flickr when they upload photos.

They are right. The interaction between Flickr and FriendFeed should actually be a publish-subscribe relationship instead of a polling relationship. Polling is a good idea for RSS/Atom for a few reasons

  • there are a thousands to hundreds of thousands clients that might be interested in a resource so the server keeping track of subscriptions is prohibitively expensive
  • a lot of these end points aren't persistently connected (i.e. your desktop RSS reader isn't always running)
  • RSS/Atom publishing is as simple as plopping a file in the right directory and letting IIS or Apache work its magic

The situation between FriendFeed and Flickr is almost the exact opposite. Instead of thousands of clients interested in document, we have one subscriber interested in thousands of documents. Both end points are always on or are at least expected to be. The cost of developing a publish-subscribe model is one that both sides can afford.

Thus this isn't a case of REST not scaling as implied by Evan and Kellan's talk. This is a case of using the wrong tool to solve your problem because it happens to work well in a different scenario. The above talk suggests using XMPP which is an instant messaging protocol as the publish-subscribe mechanism. In response to the talk, Joshua Schachter (founder of del.icio.us) suggested a less heavyweight publish-subscribe mechanism using a custom API in his post entitled beyond REST. My suggestion for people who believe they have this problem would be to look at using some subset of XMPP and experimenting with off-the-shelf tools before rolling your own solution. Of course, this is an approach that totally depends on network effects. Today everyone has RSS/Atom feeds while very few services use XMPP. There isn't much point in investing in publishing as XMPP if your key subscribers can't consume it and vice versa. It will be interesting to see if the popular "Web 2.0" companies can lead the way in using XMPP for publish-subscribe of activity streams from social networks in the same way they kicked off our love affair with RESTful Web APIs.

It should be noted that there are already some "Web 2.0" companies using XMPP as a way to provide a stream of updates to subscribing services to prevent the overload that comes from polling. For example, Twitter has confirmed that it provides an XMPP stream to FriendFeed, Summize, Zappos, Twittervision and Gnip. However they simply dump out every update that occurs on Twitter to these services instead of having these services subscribe to updates for specific users. This approach is quite inefficient and brings it's own set of scaling issues.

The interesting question is why people are just bringing this up? Shouldn't people have already been complaining about Web-based feed readers like Google Reader and Bloglines for causing the same kinds of problems? I can only imagine how many millions of times a day Google Reader must fetch content from TypePad and Wordpress.com but I haven't seen explicit complaints about this issue from folks like Anil Dash or Matt Mullenweg.

Now Playing: The Pussycat Dolls - When I Grow Up


 

Sunday, July 27, 2008 6:16:22 PM (GMT Daylight Time, UTC+01:00)
I think the reason people haven't complained before is that HTTP actually handles polling pretty well if both the client and server follow protocol. If the client asks for a resource that hasn't changed, the server sends back a 304 Not Modified with an empty body, indicating to the client that they can reuse the old response (assuming they've responsibly cached it on their side). If the server implementation is such that returning a 304 is an order of magnitude cheaper than returning a computed response of 200, then polling will scale just fine.

That said, I'm very interested in the pubsub model that XMPP promises. As for clients not being able to consume XMPP, there's a compromise: Atom-PubSub over XMPP. I was excited when I heard about this ejabberd module, but it still feels very alpha, in that I couldn't figure out how to get it installed. It's progress though!

.Carlo
Monday, July 28, 2008 7:36:12 AM (GMT Daylight Time, UTC+01:00)
XMPP is a reasonable choice for this, but I don't buy the argument that HTTP isn't suited for it. If Flickr allowed you to declare who you're interested in and then you could poll that one resource (instead of multiple thousands), it would be fine.

Oh, wait, they do..

http://www.flickr.com/services/api/flickr.photos.getContactsPhotos.html

:)
Monday, July 28, 2008 9:19:52 AM (GMT Daylight Time, UTC+01:00)
It looks like what you ACTUALLY want is a real message oriented middleware system. That's not XMPP.
Monday, July 28, 2008 2:54:19 PM (GMT Daylight Time, UTC+01:00)
This talk is confusing to say the least.

REST != polling. And their biggest argument against REST is polling. You can use a push model with REST as well (the technique is called Comet). In fact my understanding is that BOSH (xmmp HTTP layer) is based on Comet* as well.

* http://metajack.wordpress.com/2008/07/02/xmpp-is-better-with-bosh/
phil poker
Monday, July 28, 2008 8:10:57 PM (GMT Daylight Time, UTC+01:00)
I think Google Reader checks an rss feed anywhere between once every minute and once every 15 minutes. But i'd expect the number of subscribers does not really matter, the feed won't be different if Reader has one subscriber or 1000 for that feed.

That said, had it been available, i bet they would have used a pub/sub model. It fits the requirements much better.
eelcoh
Sunday, August 3, 2008 8:58:43 AM (GMT Daylight Time, UTC+01:00)
regarding the last question... it is entirely possible that, even being polled several millions times a day, those services don't suffer from this hit.
the main point here is, IMHO:
a) when you CAN reach an agreement between such two parties, it is a GOOD THING to give up polling and receive notifications that something has changed - everybody WOULD do this if it could;
b) if you CAN'T reach such an agreement, you don't have much choice but polling.
it is more of a political choice than a technical one, in the end...
anyway, I always read your feed and had not noticed the new layout - looks VERY nice (way nicer than the old one), but it took me a minute to find the "comments" link next to the title instead than at the bottom of the post...
keep up the great work!
Comments are closed.