If you hang around technology blogs and news sites, you may have seen the recent dust up after it was discovered that many iOS apps upload user address books to their servers to perform friend finding operations. There has been some outrage about this because this has happened silently in the majority of cases. The reasons for doing this typically are not nefarious. Users sign up to the service and provide their email address as a username or contact field. This is used to uniquely identify the user. Then when one of the user’s friends joins the service, the app asks the friend for their contact list then finds all of the existing users they whose email addresses are in the new users list of contacts. Such people you may know features are the bread and butter of growing the size of a social graph and connectedness in social applications.

There are a number of valid problems with this practice and the outrage has focused on one of them; apps were doing this without explicitly telling users what they were doing and then permanently storing these contact lists. However there are other problems as well that get into tricky territory around data ownership and data portability. The trickiest being whether it is OK for me to share your email address and other contact details with another entity without your permission given it identifies you. If you are a private individual with an unlisted phone number and private email address only given to a handful of people, it seems tough to concede that it is OK for me to share this information with any random app that asks me especially if you have this information as private for a reason (e.g. celebrity, high ranking government official, victim of a crime, etc).

As part of my day job leading the program management team behind the Live Connect developer program which among other things provides access to the Hotmail and Messenger contact lists, these sorts of tricky issues where one has to balance data portability and user privacy are top of mind.  I was rather pleased this morning to read a blog post titled Hashing for privacy in social apps by Matt Gemmell which advocates the approach we took with Live Connect. Matt writes

Herein lies the (false) dilemma: you’re getting a handy social feature (automatic connection with your friends), but you’re losing your privacy (by allowing your friends’ email addresses to be uploaded to Path’s servers). As a matter of fact, your friends are also losing their privacy too.

What an awful choice to have to make! If only there was a third option!

For fun, let’s have a think about what that third option would be.

Mathematics, not magic

Hypothetically, what we want is something that sounds impossible:

  1. Some way (let’s call it a Magic Spell) to change some personal info (like an email address) into something else, so it no longer looks like an email address and can’t be used as one. Let’s call this new thing Gibberish.
  2. It must be impossible (or at least very time-consuming) to change the Gibberish back into the original email address (i.e. to undo the Magic Spell).
  3. We still need a way to compare two pieces of Gibberish to see if they’re the same.

Clearly, that’s an impossible set of demands.

Except that it’s not impossible at all.

We’ve just described hashing, which is a perfectly real and readily-available thing. Unlike almost all forms of magic, it does actually exist - and like all actually-existing forms of magic, it’s based entirely on mathematics. Science to the rescue!

This is the practice we’ve advocated with Live Connect as well. Instead of returning email addresses of a user’s contacts from our APIs, we provide email hashes instead. That way applications don’t need to store or match against actual email addresses of our users but can still get all the value of providing a user with the a way to find their Hotmail or Messenger contacts who also use that service.

We also provide code samples that show how to work with these hashes and I remember being in discussions with folks on the team as to whether developers would ever want to do this since storing and comparing email addresses is less code than working with hashes. Our conclusion was that it would be just a matter of time before this would be an industry best practice and it was best if we were ahead of the curve. It will be interesting to see if our industry learns from this experience or whether it will take external pressure.

Note Now Playing: Notorious B.I.G. - Want That Old Thing Back (featuring Ja Rule and Ralph Tresvant) Note