Over the past two weeks I participated in panels at both the SXSW and MIX 09 on the growing trend of provide streams of user activities on social sites and aggregating these activities from multiple services into a single experience. Aggregating activities from multiple sites into a single service for the purpose of creating a activity stream is fairly commonplace today and was popularized by Friendfeed. This functionality now exists on many social networking sites and related services including Facebook, Yahoo! Profile and the Windows Live Profile

In general, the model is to receive or retrieve user updates from a social media site like Flickr and make these updates available on the user's profile on the target social network and share it with the user's friends via an activity stream (or news feed) on the site. The diagram below attempts to capture this many-to-many relationship as it occurs today using some well known services as examples.

The bidirectional arrows are meant to indicate that the relationship can be push-based where the content-based social media site notifies the target social network of new updates from the user or pull-based where the social network polls the site on a regular basis seeking new updates from the target user.

There are two problems that sites have to deal with in this model

  1. Content sites like Flickr have to either deal with being polled unnecessarily millions of times a day by social networks seeking photo updates from their users.  There is the money quote from last year that FriendFeed polled Flickr 2.7 million times a day to retrieve a total of less than 7,000 updates. Even if they move to a publish-subscribe model it would mean not only having to track which users are of interest to which social network but also targeting APIs on different social networks that are radically different (aka the beautiful f-ing snowflake API problem).

  2. Social aggregation services like Friendfeed and Windows Live have to target dozens of sites each with a different APIs or schemas. Even in the case where the content sites support RSS or Atom, they often use radically different schemas for representing the same data

The approach I've been advocating along with others in the industry is that we need to adopt standards for activity streams in a way that reduces the complexity of this many-to-many conversation that is currently going on between social sites.

While I was at SXSW, I met one of the folks from Gnip who is advocating an alternate approach. He argued that even with activity stream standards we've only addressed part of the problem. Such standards may mean that FriendFeed gets to reuse their Flickr code to poll Smugmug with little to no changes but it doesn't change the fact that they poll these sites millions of times a day to get a few thousand updates.

Gnip has built a model where content sites publish updates to Gnip and then social networking sites can then choose to either poll Gnip or receive updates from Gnip when the update matches one of the rules they have created (e.g. notify us if you get a digg vote from Carnage4Life). The following diagram captures how Gnip works.

The benefit of this model to content sites like Flickr is that they no longer have to worry about being polled millions of times a day by social aggregation services. The benefit to social networking sites is that they now get a consistent format for data from the social media sites they care about and can choose to either pull the data or have it pushed to them.

The main problem I see with this model is that it sets Gnip up to be this central point of failure and I'd personally rather deal interact directly with the content services directly instead of inject a middle man into the process. However I can see how their approach would be attractive to many sites who might be buckling under the load of being constantly polled and to social aggregation sites that are tired of hand coding adapters for each new social media sites they want to integrate with. 

What do you think of Gnip's service and the problem space in general?

Note Now Playing: EamonF**k It (I Don't Want You Back) Note