My job at Microsoft is working on the contacts platform that is utilized by a number of Windows Live services. The contacts platform is a unified graph of the relationships our users have created across Windows Live. It includes a user's Windows Live Hotmail contacts, their Windows Live Spaces friends, their Windows Live Messenger buddies and anyone they've added to an access control list (e.g. people who can access their shared folders in Windows Live Skydrive or the events in their calendar). Basically, a while ago one of our execs thought it didn't make sense to build a bunch of social software applications each acting as a silo of user relationships and that instead we should have a unified graph of the user to user relationships within Windows Live. Fast forward a couple of years and we now have a clearer idea of the pros and cons of building a unified social graph.

Given the above, it should be no surprise that I read Brad Fitzpatrick's Thoughts on the Social Graph with keen interest since it overlaps significantly with my day job. I was particularly interested in the outlined goals for the developers API which are included below

For developers who don't want to do their own graph analysis from the raw data, the following high-level APIs should be provided: 

  1. Node Equivalence, given a single node, say "brad on LiveJournal", return all equivalent nodes: "brad" on LiveJournal, "bradfitz" on Vox, and 4caa1d6f6203d21705a00a7aca86203e82a9cf7a (my FOAF mbox_sha1sum). See the slides for more info.
  2. Edges out and in, by node. Find all outgoing edges (where edges are equivalence claims, equivalence truths, friends, recommendations, etc). Also find all incoming edges.
  3. Find all of a node's aggregate friends from all equivalent nodes, expand all those friends' equivalent nodes, and then filter on destination node type. This combines steps 1 and 2 and 1 in one call. For instance, Given 'brad' on LJ, return me all of Brad's friends, from all of his equivalent nodes, if those [friend] nodes are either 'mbox_sha1sum' or 'Twitter' nodes.
  4. Find missing friends of a node. Given a node, expand all equivalent nodes, find aggregate friends, expand them, and then report any missing edges. This is the "let the user sync their social networking sites" API. It lets them know if they were friends with somebody on Friendster and they didn't know they were both friends on MySpace, they might want to be.

here are the top three problems Brad and the rest of the Google folks working on this project will have to factor in as they chase the utopia that is a unified social graph.
  1. Some Parts of the Graph are Private: Although social networking sites with publicly articulated social networks are quite popular (e.g. MySpace) there are a larger number of private or semi-private social networks that either can only be viewed by the owner of the list (e.g. IM buddy lists) or some subset of the graph (e.g. private profiles on social networking sites MySpace, Facebook, Windows Live Spaces, etc). The latter is especially tricky to deal with. In addition, people often have more non-public articulated social networks (i.e. friends lists) than public ones despite the popularity of social networking sites with public profiles.

  2. Inadvertent Information Disclosure caused by Linking Nodes Across Social Networks: The "find missing friends of a node" feature in Brad's list sounds nice in theory but it includes a number of issues that users often consider to be privacy violations or just plain creepy. Let's say, I have Batman on my friend's list in MySpace because I think the caped crusader is cool. Then I join LiveJournal and it calls the find_missing_friends() API to identify which of my friends from other sites are using LiveJournal and it find's Bruce Wayne's LiveJournal? Oops, an API call just revealed Batman's secret identity. A less theoretical version of this problem occurred when we first integrated Windows Live Spaces with Windows Live Messenger, and some of our Japanese beta users were stunned to find that their supposedly anonymous blog postings were now a click away for their IM buddies to see. I described this situation briefly in my submission to the 2005 Social Computing Symposium.

  3. All "Friends" aren't Created Equal: Another problem is that most users don't want all their "friends" available in all their applications. One capability we were quite proud off at one time is that if you had Pocket MSN Messenger then we merged the contacts on your cell phone with your IM and email contacts. A lot of people were less than impressed by this behavior. Someone you have on your IM buddy list isn't necessarily someone you want in your cell phone address book. Over the years, I've seen more examples of this than I can count. Being "friends" in one application does not automatically mean that two users want to be "friends" in a completely different context.

These are the kinds of problems we've had to deal with on my team while also trying to make this scale to being accessed by services utilized by hundreds of millions of users. I've seen what it takes to build a system like this first hand and Brad & company have their work cut out for them. This is without considering the fact that they may have to deal with ticked of users or ticked off social networking sites depending on how exactly they plan to build this giant database of user friend lists.

PS: In case any of this sounds interesting to you, we're always hiring. :)


 

Monday, 20 August 2007 18:22:27 (GMT Daylight Time, UTC+01:00)
Yeah, I think you've outlined the 3 major obstacles in brad's vision. I think that ultimately, his perspective is interesting because it bites off a really small chunk, namely matching social networking identities based on some kind of primary identity key. So this is a nice contribution, but as you point out, there's some funky privacy stuff involved. I would also add a fourth bullet here that 4) all identities are not created equal. this has a lot to do with part three here, but I think there is at least some assumption here that people maintain just one online identity, or that they should just claim one identity, is problematic and needs to be addressed on some level.
Comments are closed.