Paul Buchheit of FriendFeed has written up a proposal for a new protocol that Web sites can implement to reduce the load on their services from social network aggregators like FriendFeed and SocialThing. He unveils his proposal in his post Simple Update Protocol: Fetch updates from feeds faster which is excerpted below

When you add a web site like Flickr or Google Reader to FriendFeed, FriendFeed's servers constantly download your feed from the service to get your updates as quickly as possible. FriendFeed's user base has grown quite a bit since launch, and our servers now download millions of feeds from over 43 services every hour.

One of the limitations of this approach is that it is difficult to get updates from services quickly without FriendFeed's crawler overloading other sites' servers with update checks. Gary Burd and I have thought quite a bit about ways we could augment existing feed formats like Atom and RSS to make fetching updates faster and more efficient. Our proposal, which we have named Simple Update Protocol, or SUP, is below.
...
Sites wishing to produce a SUP feed must do two things:

  • Add a special <link> tag to their SUP enabled Atom or RSS feeds. This <link> tag includes the feed's SUP-ID and the URL of the appropriate SUP feed.
  • Generate a SUP feed which lists the SUP-IDs of all recently updated feeds.

Feed consumers can add SUP support by:

  • Storing the SUP-IDs of the Atom/RSS feeds they consume.
  • Watching for those SUP-IDs in their associated SUP feeds.

By using SUP-IDs instead of feed urls, we avoid having to expose the feed url, avoid URL canonicalization issues, and produce a more compact update feed (because SUP-IDs can be a database id or some other short token assigned by the service). Because it is still possible to miss updates due to server errors or other malfunctions, SUP does not completely eliminate the need for polling.

Although there's a healthy conversation about SUP going on in FriendFeed in response to one of my tweets, I thought it would be worth sharing some thoughts on this with a broader audience.

The problem statement that FriendFeed's SUP addresses is the following issue raised in my previous post When REST Doesn't Scale, XMPP to the Rescue?

On July 21st, FriendFeed had 45,000 users who had associated their Flickr profiles with their FriendFeed account. FriendFeed polls Flickr about once every 20 – 30 minutes to see if the user has uploaded new pictures. However only about 6,000 of those users logged into Flickr that day, let alone uploaded pictures. Thus there were literally millions of HTTP requests made by FriendFeed that were totally unnecessary.

FriendFeed's proposal is similar to the Six Apart Update Stream and the Twitter XMPP Firehose in that it is a data stream containing information about all of the updates users are making on a particular service. It differs in a key way in that it doesn't actually contain the data from the user updates but instead identifiers which can be used to determine the users that changed so their feeds can be polled.

This approach aims at protecting feeds that use security through obscurity such as the Google Reader's Shared Items feed and Netflix's Personalized Feeds. The user shares their "secret" feed URL with FriendFeed who then obtains the SUP ID of the user's feed when the feed is first polled. Then whenever that SUP ID is seen in the SUP feed by FriendFeed, they know to go re-poll the user's "secret" feed URL.

For services that are getting a ton of traffic from social network aggregators or Web-based feed readers it does make sense to provide some sort of update stream or fire hose to reduce the amount of polling that gets done. In addition, it also makes sense that if more and more services are going to provide such update streams then it should be standardized so that social network aggregators and similar services do not end up having to target multiple update protocols.

I believe that at the end we will see a continuum of options in this space. The vast majority of services will be OK with the load generated by social networking aggregators and Web-based feed readers when polling their feeds. These services won't see the point of building additional features to handle this load. Some services will run the numbers like Twitter & Six Apart have done and will provide update streams in an attempt to reduce the impact of polling. For these services, SUP seems like a somewhat elegant solution and it would be good to standardize on something, anything at all is better than each site having its own custom solution. For a smaller set of services, this still won't be enough since they don't provide feeds (e.g. Blockbuster's use of Facebook Beacon) and you'll need an explicit server to server publish-subscribe mechanism. XMPP or perhaps something an HTTP based publish-subscribe mechanism like what Joshua Schachter proposed a few weeks ago will be the best fit for those scenarios. 

Now Playing: Jodeci - I'm Still Waiting


 

Friday, 29 August 2008 11:33:55 (GMT Daylight Time, UTC+01:00)
OK. So as I understand it, the main problem is that

GET /blog/feed

returns the entire feed and so increases load. There is a perfectly RESTful solution to this already built into HTTP:

HEAD /blog/feed

In the request header, one could include If-Modified-Since and check for a 304 response code. Alternatively one could just monitor the Last-Modified response header. Of course, the main problem with this is that currently most servers don't handle HEAD requests. It is my opinion that this is the area we should be looking at fixing, not creating additional competing (and very limited in scope) protocols.
Friday, 29 August 2008 13:35:49 (GMT Daylight Time, UTC+01:00)
Tom,
The point is that even with conditional GET requests via If-Modified-Since/If-None-Match (which are a better approach to using HEAD), the large number of unnecessary connections still has a cost. Instead of FriendFeed polling Flickr every 20 - 30 minutes for 45,000 users and making 3 million HTTP requests to Flickr, they could poll the SUP feed every 5 seconds then fetch the feed of any user that showed up in the feed. This would reduce the number of requests to less than 100,000 HTTP requests from FriendFeed to Flickr from 3 million.

My counter argument would be that Flickr already has to deal with even more load from various RSS readers, search crawlers and widgets that use their API. So doing extra work to optimize this one case may not give them that much bang for the buck when placed in the context of all the load being placed on their system.
Saturday, 30 August 2008 11:51:16 (GMT Daylight Time, UTC+01:00)
It looks useful, but I think the sup feed becomes its own bottleneck. You'll have to cache it on the server. And you'll have to hit it fast on the client so you don't miss anything. It's a "Four More Years" play.
Wednesday, 03 September 2008 00:52:52 (GMT Daylight Time, UTC+01:00)
internet-scale polling doesn't work. the ratio of productive to unproductive polls will continue to cause blue whales until we move to a push model, such as the one I described here

http://blog.pasker.net/2008/04/14/a-universal-messaging-hub/
Comments are closed.