September 14, 2007
@ 03:30 PM

Torsten and I just got an email asking whether RSS Bandit is an abandoned project. This is a fair question given how little activity there has been from us over the past couple of months.

The fact is that we're both pretty busy in our personal lives right now. I'm getting married a week from tomorrow and Torsten just had a baby. Since RSS Bandit is a project we maintain in our free time, it has suffered while we go through transitions on the home front. I'm about a month away from being able to dedicate any serious time to the project and Torsten is in a similar boat.

However the currently checked in code is a lot more stable than the current release since we have fixed or worked around a number of issues in Lucene.NET which we use for building the search index.  For that reason, I'll be releasing a new version of RSS Bandit this weekend which should be fix a lot of issues a number of our regular users have had with the last release.

Next month, I'll start thinking about the features I'd like to add in the next release and should be able to start writing new code again. Thanks for your patience and support.


 

Categories: RSS Bandit

Recently I took a look at CouchDB because I saw it favorably mentioned by Sam Ruby and when Sam says some technology is interesting, he’s always right. You get the gist of CouchDB by reading the CouchDB Quick Overview and the CouchDB technical overview.  

CouchDB is a distributed document-oriented database which means it is designed to be a massively scalable way to store, query and manage documents. Two things that are interesting right off the bat are that the primary interface to CouchDB is a RESTful JSON API and that queries are performed by creating the equivalent of stored procedures in Javascript which are then applied on each document in parallel. One thing that not so interesting is that editing documents is lockless and utilizes optimistic concurrency which means more work for clients.

As someone who designed and implemented an XML Database query language back in the day, this all seems strangely familiar.

So far, I like what I’ve seen but there seems to already be a bunch of incorrect hype about the project which may damage it’s chances of success if it isn’t checked. Specifically I’m talking about Assaf Arkin’s post CouchDB: Thinking beyond the RDBMS which seems chock full of incorrect assertions and misleading information.

Assaf writes

This day, it happens to be CouchDB. And CouchDB on first look seems like the future of database without the weight that is SQL and write consistency.

CouchDB is a document oriented database which is nothing new [although focusing on JSON instead of XML makes it buzzword compliant] and is definitely not a replacement/evolution of relational databases. In fact, the CouchDB folks assert as much in their overview document.

Document oriented database work well for semi-structured data where each item is mostly independent and is often processed or retrieved in isolation. This describes a large category of Web applications which are primarily about documents which may link to each other but aren’t processed or requested often based on those links (e.g. blog posts, email inboxes, RSS feeds, etc). However there are also lots of Web applications that are about managing heavily structured, highly interrelated data (e.g. sites that heavily utilize tagging or social networking) where the document-centric model doesn’t quite fit.

Here’s where it gets interesting. There are no indexes. So your first option is knowing the name of the document you want to retrieve. The second is referencing it from another document. And remember, it’s JSON in/JSON out, with REST access all around, so relative URLs and you’re fine.

But that still doesn’t explain the lack of indexes. CouchDB has something better. It calls them views, but in fact those are computed tables. Computed using JavaScript. So you feed (reminder: JSON over REST) it a set of functions, and you get a set of queries for computed results coming out of these functions.

Again, this is a claim that is refuted by the actual CouchDB documentation. There are indexes, otherwise the system would be ridiculously slow since you would have to run the function and evaluate every single document in the database each time you ran one of these views (i.e. the equivalent of a full table scan). Assaf probably meant to say that there aren’t any relational database style indexes but…it isn’t a relational database so that isn’t a useful distinction to make.

I’m personally convinced that write consistency is the reason RDBMS are imploding under their own weight. Features like referential integrity, constraints and atomic updates are really important in the client-server world, but irrelevant in a world of services.

You can do all of that in the service. And you can do better if you replace write consistency with read consistency, making allowances for asynchronous updates, and using functional programming in your code instead of delegating to SQL.

I read these two paragraph five or six times and they still seem like gibberish to me. Specifically, it seems silly to say that maintaining data consistency is important in the client-server world but irrelevant in the world of services. Secondly, “Read consistency” and “write consistency” are not an either-or choice. They are both techniques used by database management systems, like Oracle, to present a deterministic and consistent experience when modifying, retrieving and manipulating large amounts of data.

In the world of online services, people are very aware of the CAP conjecture and often choose availability over data consistency but it is a conscious decision. For example, it is more important for Amazon that their system is always available to users than it is that they never get an order wrong every once in a while. See Pat Helland’s (ex-Amazon architect) example of a how a business-centric approach to data consistency may shape one’s views from his post Memories, Guesses, and Apologies where he writes

#1 - The application has only a single replica and makes a "decision" to ship the widget on Wednesday.  This "decision" is sent to the user.

#2 - The forklift pummels the widget to smithereens.

#3 - The application has no recourse but to apologize, informing the customer they can have another widget in one month (after the incoming shipment arrives).

#4 - Consider an alternate example with two replicas working independently.  Replica-1 "decides" to ship the widget and sends that "decision" to User-1.

#5 - Independently, Replica-2 makes a "decision" to ship the last remaining widget to User-2.

#6 - Replica-2 informs Replica-1 of its "decision" to ship the last remaining widget to User-2.

#7 - Replica-1 realizes that they are in trouble...  Bummer.

#8 - Replica-1 tells User-1 that he guessed wrong.

#9 - Note that the behavior experienced by the user in the first example is indistinguishable from the experience of user-1 in the second example.

Eventual Consistency and Crappy Computers

Business realities force apologies.  To cope with these difficult realities, we need code and, frequently, we need human beings to apologize.  It is essential that businesses have both code and people to manage these apologies.

We try too hard as an industry.  Frequently, we build big and expensive datacenters and deploy big and expensive computers.   

In many cases, comparable behavior can be achieved with a lot of crappy machines which cost less than the big expensive one.

The problem described by Pat isn’t a failure of relational databases vs. document oriented ones as Assaf’s implication would have us believe. It is the business reality that availability is more important than data consistency for certain classes of applications. A lot of the culture and technologies of the relational database world are about preserving data consistency [which is a good thing because I don’t want money going missing from my bank account because someone thought the importance of write consistency is overstated] while the culture around Web applications is about reaching scale cheaply while maintaining high availability in situations where the occurence of data loss is unfortunate but not catastrophic (e.g. lost blog comments, mistagged photos, undelivered friend requests, etc).

Even then most large scale Web applications that don’t utilize the relational database features that are meant to enforce data consistency (triggers, foreign keys, transactions, etc) still end up rolling their own app-specific solutions to handle data consistency problems. However since these are tailored to their application they are more performant than generic features which may exist in a relational database. 

For further reading, see an Overview of the Flickr Architecture.

Now playing: Raekwon - Guillotine (Swordz) (feat. Ghostface Killah, Inspectah Deck & GZA/Genius)


 

Categories: Platforms | Web Development

I attended the Data Sharing Summit last Friday and it was definitely the best value (with regards to time and money invested) that I've ever gotten out of a conference. You can get an overview of that day’s activities in Marc Canter’s post Live blogging from the DataSharingSummit. I won’t bother with a summary of my impressions of the day since Marc’s post does a good job of capturing the various topics and ideas that were discussed.

What I will discuss is a technology initiative called OAuth which is being cooked up by the Web platform folks at Yahoo!, Google, Six Apart, Pownce, Twitter and a couple of other startups.

It all started when Leah Culver was showing off the profile aggregation feature of Pownce on her Pownce profile. If you look at the bottom left of that page, you’ll notice that it links to her profiles on Digg, Facebook, Upcoming, Twitter as well as her weblog. Being the paranoid sort, I asked whether she wasn’t worried that her users could fall victim to imposters such as this fake profile of Robert Scoble?  She replied that her users should be savvy enough to realize that just because there are links to profiles in a side bar doesn’t prove that the user is actually that person. She likened the feature to blog *bling* as opposed to something that should be taken seriously.   

I pressed further and asked whether she didn’t think that it would be interesting if a user of Pownce could prove their identity on Twitter via OpenID (see my proposal for social network interoperability based on OpenID for details) and then she could use the Twitter API to post a user’s content from Pownce onto Twitter and show their Twitter “tweets” within Pownce. This would get rid of all the discussions about having to choose between Pownce and Twitter because of your friends or even worse using both because your social circle spans both services. It should be less about building walled gardens and more about increasing the size of the entire market for everyone via interoperability. Leah pointed out something I’d overlooked. OpenID gives you a way to answer the question “Is leahculver @ Pownce also leahculver @ Twitter?” but it doesn’t tell you how Pownce can then use this information to perform actions on Leah’s behalf on Twitter. Duh. I had implicitly assumed that whatever authentication ticket returned from the OpenID validation request could be used as an authorization ticket when calling the OpenID provider’s API, but there’s nothing that actually says this has to be the case in any of the specs.

Not only was none of this thinking new to Leah, she informed me that she had been working with folks from Yahoo!, Google, Six Apart, Twitter, and other companies on a technology specification called OAuth. The purpose of which was to solve the problem I had just highlighted. There is no spec draft on the official site at the current time but you can read version 0.9 of the specification. The introduction of the specification reads

The OAuth protocol enables interaction between a Web Service Provider(SP) and Consumer website or application. An example use case would be allowing Moo.com, the OAuth Consumer, to access private photos stored on Flickr.com, the OAuth Service Provider. This allows access to protected resources (Protected Resources) via an API without requiring the User to provide their Flickr.com (Service Provider) credentials to Moo.com (Consumer). More generically, OAuth creates a freely implementable and generic methodology for API authentication creating benefit to developers wishing to have their Consumer interact with various Service Providers.

While OAuth does not require a certain form of user interface or interaction with a User, recommendations and emerging best practices are described below. OAuth does not specify how the Service Provider should authenticate the User which makes the protocol ideal in cases where authentication credentials are not available to the Consumer, such as with OpenID.

This is an interesting example of collaboration between competitors in the software industry and a giant step towards actual interoperability between social networking sites and social graph applications as opposed to mere social network portability.  

The goal of a OAuth is to move from a world where sites collect users credentials (i.e. username/passwords) for other Web sites so they can screen scrape the user’s information to one where users authorize Web sites and applications to act on their behalf on the target sites in a way that puts the user in control and doesn’t require giving up their usernames and passwords to potentially untrustworthy sites (Quechup flavored spam anyone?). The way it does so is by standardizing the following interaction/usage flow

Pre-Authentication

  1. The Consumer Developer obtains a Consumer Key and a Consumer Secret from the Service Provider.

Authentication

  1. The Consumer attempts to obtain a Multi-Use Token and Secret on behalf of the end user.
    • Web-based Consumers redirect the User to the Authorization Endpoint URL.
    • Desktop-based Consumers first obtain a Single-Use Token by making a request to the API Endpoint URL then direct the User to the Authorization Endpoint URL.
    • If a Service Provider is expecting Consumer that run on mobile devices or set top boxes, the Service Provider should ensure that the Authorization Endpoint URL and the Single Use Token are short and simple enough to remember for entry into a web browser.
  2. The User authenticates with the Service Provider.
  3. The User grants or declines permission for the Service Provider to give the Consumer a Multi-Use Token.
  4. The Service Provider provides a Multi-Use Token and Multi-Use Token Secret or indicates that the User declined to authorize the Consumer’s request.
    • For Web-based Consumers, the Service Provider redirects to a pre-established Callback Endpoint URL with the Single Use Token and Single-Use Authentication Secret as arguments.
    • Mobile and Set Top box clients wait for the User to enter their Single-Use Token and Single-Use Secret.
    • Desktop-based Consumers wait for the User to assert that Authorization has completed.
  5. The Consumer exchanges the Single-Use Token and Secret for a Multi-User Token and Secret.
  6. The Consumer uses the Multi-Use Token, Multi-Use Secret, Consumer Key, and Consumer Secret to make authenticated requests to the Service Provider.

This standardizes the kind of user-centric API model that is utilized by Web services such as the Windows Live Contacts API, Google AuthSub and the Flickr API to authenticate and authorize applications to access a user’s data. I suspect that the reason OAuth does not mandate a particular authentication mechanism is as a way to grandfather in the various authentication mechanisms used by all of these APIs today. It’s one thing to ask vendors to add new parameters and return types to the information exchanged during user authentication between Web sites and another to request that they completely replace one authentication technology with another.

I’d love to see us get behind an effort like this at Windows Live*. I’m not really sure if there is an open way to participate. There seems to be a Google Group but it is private and all the members seem to be folks that know each other personally. I guess I’ll have to shoot some mail to the folks I met at the Data Sharing Summit or maybe one of them will see this post and will respond in the comments.

PS: More details about OAuth can be found in the blog post by Eran Hammer-Lahav entitled Explaining OAuth.

*Disclaimer: This does not represent the intentions, strategies, wishes or product direction of my employer. It is merely wishful thinking on my part.

Now playing: Sean Kingston - Beautiful Girls (remix) (feat. Fabolous & Lil Boosie)


 

We attended Justin Timberlake FutureSex/LoveShow on Saturday and it was my best concert going experience in the Seattle area to date. A lot had to do with the location. The Tacoma Dome being an enclosed building and a sports arena has the right combination of acoustics and availability of vending services. I attended  Eminem’s Anger Management 3 and Kenny Chesney’s The Road & Radio at the White River Amphitheatre and Qwest Field respectively and both locations were suffered from lousy acoustics because they were open air locations. 

We missed the opening band which didn’t bother me once it was confirmed that it was not going to be Timbaland. JT’s performance was top notch, he not only performed the hits from both albums but also threw in some unexpected surprises like his verses from Gone and Dick in a Box. After the audience was done singing along to the latter, he joked “You guys watch too much YouTube”. I was amused that he didn’t say “You guys watch too much SNL”. He also mentioned that it had just won an Emmy that night. “Only in America,” he laughed. 

I was surprised to notice that audience seemed to have been an older crowd than the crowd from Bumbershoot. That said, all those JT fans in their 20s and 30s still scream like teenage girls so I’d suggest some ear plugs if you’re ever thinking of attending one of his concerts. 

Now playing: Justin Timberlake - What Goes Around.../...Comes Around Interlude


 

Categories: Music

From the documentation for Directory.GetFiles Method (String, String)

When using the asterisk wildcard character in a searchPattern, such as "*.txt", the matching behavior when the extension is exactly three characters long is different than when the extension is more or less than three characters long. A searchPattern with a file extension of exactly three characters returns files having an extension of three or more characters, where the first three characters match the file extension specified in the searchPattern. A searchPattern with a file extension of one, two, or more than three characters returns only files having extensions of exactly that length that match the file extension specified in the searchPattern.

I realize this behavior is probably well known to Windows developers, but seriously, WTF? How do I match on three character file extensions (i.e. the majority of file extensions) without getting cruft as well (e.g. matching on .cpp files without getting Emacs backups with .cpp~ file extensions)?

Now playing: Mystikal - I Smell Smoke


 

Categories: Programming

Since tomorrow is the Data Sharing Summit, I was familiarizing myself with the OpenID specification because I was wondering how it dealt with people making false claims about their identity.

Scenario: I go to http://socialnetwork.example.com which supports OpenID and claim that my URL is http://brad.livejournal.com (i.e. I am Brad Fitzpatrick). The site redirects me to https://www.livejournal.com/login.bml along with a query string that has certain parameters specified (e.g. “?openid.mode=checkid_immediate&openid.identity=brad&openid.return_to=http://socialnetwork.example.com/home.html”) which is a long winded way of saying “LiveJournal can you please confirm that this user is brad then redirect them back to our site when you’re done?” At this point, I could make an HTTP request to http://socialnetwork.example.com/home.html, specify the Referrer header value as https://www.livejournal.com/login.bml and claim that I’ve been validated by LiveJournal as brad.  

This is a pretty rookie example but it gets the idea across. OpenID handles this spoofing problem by requiring an OpenID consumer (e.g. http://socialnetwork.example.com) to first make an association request to the target OpenID provider (e.g. LiveJournal) before making performing any identity validation. The purpose of the request is to get back an association handle which is in actuality a shared secret between both services which should be specified by the Consumer as part of each checkid_immediate request made to the Identity Provider.  

There is also a notion of a dumb mode where instead of making the aforementioned association request, the consumer asks the Identity Provider whether the assoc_handle returned by the redirected user is a valid one via a check_authentication request. This is a somewhat chattier way to handle the problem but it leads to the same results.

So far, I think I like OpenID. Good stuff.

Now playing: Gangsta Boo - Who We Be


 

Categories: Technology

September 6, 2007
@ 07:29 PM

Chris Jones has a blog post on the Windows Live Wire blog entitled Test drive the new Windows Live suite where he writes

You’ve probably already read about some changes we’re making to Windows Live, and have seen some of your services change over the past few weeks. Starting now, you can test out the new suite of Windows Live software at http://get.live.com/wl/all

Windows Live makes it easy to store and manage your communications and information, and share what’s going on in your life with the people who mean the most to you. Many of you have already tried out new versions of our web services – Windows Live Hotmail, Windows Live Spaces, Windows Live SkyDrive beta, and the new Windows Live Home page beta. These have been designed to work together with a common navigation, so it is easy to switch between your e-mail, your space, your files, and your photos—from any browser.

Today we’re releasing beta versions of a new generation of Windows Live software designed for your Windows PC that makes it easier than ever to get connected to Windows Live or other services. This suite of software includes e-mail (Windows Live Mail), photo sharing (Windows Live Photo Gallery), a great publishing tool that lets you post directly to your blog (Windows Live Writer), parental controls (Windows Live OneCare Family Safety), a new version of Windows Live Messenger (8.5), and more.

As you can tell, Windows Live is coming together and there is growing clarity around the brand. All the talk of being a “suite” and unified installers struck me as anachronisms from an executive team that came from the world of Office and Windows but now that I’ve begun to see some of the fruits of their labor it looks like a good thing. An integrated set of desktop and Web apps that play well together makes a lot of sense.

I’ve also surprised myself by liking the more consistent UI across [some of] the various Windows Live sites but would like to see us do more integration of the Web applications. For example, the integration between SkyDrive and Windows Live Spaces is cool  but it seems we are last out of the gate with integration between IM and email (unlike Yahoo! Mail and GMail). I’d also like to see a couple more Windows Live services being available from http://home.live.com and sharing the same consistent UI such as Windows Live Expo and Live QnA. I guess I’m hard to please.  :)  Kudos to all the folks that worked on the current releases.  

The unified installer is one of those things that seems weird but after using it I wonder why we didn’t provide one sooner. It’s pretty convenient to be able to grab the latest Windows Live apps at a single go. It’s definitely worth trying out especially if you haven’t tried out Windows Live Photo Gallery or Windows Live Mail yet. So what are you waiting for? Get it now.

Now playing: Three 6 Mafia - Most Known Unknown Hits


 

Categories: Windows Live

Yesterday the Wall Street Journal had an article entitled Why So Many Want to Create Facebook Applications which gives an overview of the burst of activity surrounding the three month old Facebook platform. If a gold rush, complete with dedicated VC funds targetting widget developers, around building embedded applications in a social networking site sounds weirdly familar to you, that’s because it is. This time last year people were saying the same thing about building MySpace widgets. The conventional wisdom at the time was that sites like YouTube (acquired for $1.65 billion) and PhotoBucket (acquired for $250 million) rose in popularity due to their MySpace widget strategy.

So, why would developers who’ve witnessed the success of companies developing MySpace widgets rush to target a competing social networking site that has less users and requires more code to integrate with the site? The answer is that MySpace made the mistake of thinking that they were a distribution channel instead of a platform. If you are a distribution channel, you hold all the cards. Without you, they have no customers. On the other hand, if you are a platform vendor you realize that it is a symbiotic relationship and you have to make people building on your platform successful because of [not in spite of] your efforts.

Here are the three classic mistakes the folks at MySpace made which made it possible for Facebook to steal their thunder and their widget developers.

  1. Actively Resent the Success of Developers on Your Platform: If you are a platform vendor, you want developers building on your platform to be successful. In contrast, MySpace’s executives publicly griped about the success of sites like YouTube and PhotoBucket that were “driven off the back of MySpace” and bragged about building competing services which would become as popular as them since “60%-70% of their traffice came from MySpace”. In a sign that things may have gotten out of hand is when MySpace blocked PhotoBucket widgets only to acquire the site a month later, indicating that this was an aggresive negotiation tactic intended to scare off potential buyers.  

  2. Limit the Revenue Opportunities of Developers on Your Platform: MySpace created all sorts of restrictions to make it difficult for widget developers to actually make money directly from the site. For one, they blocked any widget that contained advertising even though advertising is the primary way to make money on the Web. Secondly, they restricted the options widgets had in linking back to the widget developers website thus driving users to where they could actually show them ads. Instead of trying to create a win<->win situation for widget developers (MySpace gets free features thus more engagement from their users, widget developers get ad revenue and traffic) the company tipped the balance excessively in their favor with little upside for widget developers.  

  3. Do Not Invest in Your Platform: For a company that depends so much on developers building tiny applications that integrate into their site, it’s quite amazing that MySpace does not provide any APIs at all. Nor do they provide a structured way for their users to find, locate and install widgets. It turns out that Fox Interactive Media (MySpace’s parent company) did build a widget platform and gallery but due to internal politics these services are not integrated. In fact, one could say that MySpace has done as little as possible to making developing widgets for their platform a pleasant experience for developers or their users. 

This is pretty much the story of all successful technology platforms that fall out of favor. If you do not invest in your platform, it will become obsolete. If people are always scared that you will cut off their air supply out of jealousy, they’ll bolt the first chance they get. And if people can’t make money building on your platform, then there is no reason for them to be there in the first place. Don’t make the same mistakes.

Now playing: 50 Cent - Many Men (Wish Death)


 

We were at Bumbershoot on Monday because one of Jenna's friends is the drummer in the Sneaky Thieves and we came to show our love. Since we were already there we decided to stay for two of the main stage concerts.

We saw John Legend in the afternoon and his set was quite good even though the acoustics weren’t that great since it’s an open air stadium. Once we figured out that we needed to go down in front of the stage instead of sitting up in the bleachers, it went from “aight” to “tight”.

The late show was Wu-Tang Clan and they represented. There were tracks from solo albums, from their classic first and second albums and even some of Ol’ Dirty’s singles rapped by Method Man. It was sick. The main surprise of the show was seeing how many kids who look like they weren’t born when Enter the Wu-Tang (36 Chambers) first dropped were in attendance. It was also kinda scary seeing so many kids blowing doja but I tried to remember that it was the same way when I was in my teens. Dang, I’m already getting too old for concerts.

In between John Legend and Wu-Tang, we went to see Rush Hour 3. It was pretty bad. Not only did the plot make no sense at all but they also reused plot elements from the previous movies in a non-ironic way. There were laughs but they came infrequently and the action was heavily toned down [probably because Jackie Chan is now in his fifties]. Overall, I give it *** out of ***** because it was still better than most of the crap Hollywood puts out these days.   

Now playing: Wu-Tang Clan - Protect Ya Neck


 

Categories: Movie Review | Music | Personal

One of the core tennets we’ve had when designing social graph applications within Windows Live is that we always put users in control. Which means privacy features and opt out galore. Manifestations of this include

  • You can’t IM a Windows Live Messenger user unless they’ve given you permission to do so. So IM spam is pretty much nonexistent on our network. At worst, there is the potential of gettig lots of IM buddy requests from spammers if you have a guessable email address but even that problem has seemed more theoretical than real in our experience. 

  • Don't like getting friend invites from Windows Live Spaces? You can opt out of getting them completely or restrict who can send them to you to your first degree social networks (e.g. IM buddies) or your second degree networks (e.g. friends of friends).

  • If a non-Microsoft application wants to access your social graph (e.g. IM buddy list or Hotmail address book) using our contact APIs, not only does it need access to your log-in credentials but it also needs explicit permission from you which can be revoked if the application becomes untrustworthy.

The last item is what I want to talk about today. Pete Cashmore over at Mashable has a blog post entitled Are You Getting Quechup Spammed? where he writes

One controversial issue among social networks is how hard they should push for user acquisition. Most social networks these days let you to import your email address book in some way (Twitter is the latest), but most make it clear if they’re about to mail your contacts.

One site that’s catching people off guard is Quechup: we’ve got a volley of complaints about them in the mailbox this weekend, and a quick Google reveals that others were caught out too.

The issue lies with their “check for friends” form: during signup you’re asked to enter your email address and password to see whether any of your friends are already on the service. Enter the password, however, and it will proceed to mail all your contacts without asking permission. This has led to many users issuing apologies to their friends for “spamming” them inadvertently. Hopefully the bad PR on this one will force them to change the system.

In related news, ZDnet investigates social services Rapleaf and UpScoop, pointing out that they’re run by TrustFuse, a company that sells data to marketers. UpScoop lets you enter your email address and password and find all your friends on social networks. The company is not selling the email addresses you input, but those clients who already have lists of email addresses can bring those to TrustFuse and receive additional information about those people mined from public social networking profiles. The aggregation of all that data is perfectly legal and perhaps even ethically sound, but it’s a little unnerving for some.

I won’t comment on the legality of these services except to point out that a number of practices used to obtain a user’s contact list violate the Terms of Service of the sites they are obtained from especially when these sites have APIs. Of course, I am not a lawyer and don’t play one on TV.

I will point out that 9 times out of 10 when you hear geeks talking about social network portability or similar buzzwords they are really talking about sending people spam because someone they know joined some social networking site. I also wonder how many people realize that these fly-by-night social networking sites that they happily hand over their log-in credentials to so they can spam their friends also share the list of email addresses thus obtained with services that resell to spammers?

This brings me to Brad FitzPatrick’s essay Thoughts on the Social Graph which lists the following as one of the goals of a project he is working on while at Google

For end-users:

  1. A user should then be able to log into a social application (e.g. dopplr.com) for the first time, ideally but not necessarily with OpenID, and be presented with a dialog like,
    "Hey, we see from public information elsewhere that you already have 28 friends already using dopplr, shown below with rationale about why we're recommending them (what usernames they are on other sites). Which do you want to be friends with here? Or click 'select-all'."
    Also every so often while you're using the site dopplr lets you know if friends that you're friends with elsewhere start using the site and prompts you to be friends with them. All without either of you re-inviting/re-adding each other on dopplr... just because you two already declared your relationship publicly somewhere else. Note: some sites have started to do things like this, in ad-hoc hacky ways (entering your LJ username to get your other LJ friends from FOAF, or entering your email username/password to get your address book), but none in a beautiful, comprehensive way.

The question that runs through my mind is if you are going to build a system like this, how do you prevent badly behaved applications like Quechup from taking control away from your users? At the end of the day your users might end up thinking you sold their email addresses to spammers when in truth it was the insecure practices of the people who they’d shared their email addresses with that got them in that mess. This is one of the few reasons I can understand why Facebook takes such a hypocritical approach. :) 

At least Brad's design seems to assume that the only identifiers for users within his system will be the equivalent of foaf:mbox_sha1sum. However I suspect that many of the startups expressing interest in this space are interested in sharing rich profile data and legitimate contact information not just hashes of interesting data.

I’ll find out if my suspicions are worth anything later this week when I’m at the Data Sharing Summit.

PS: If you really want to put your tin foil hat on, read this on post on the Google Group on social network portability on Evil Third Party Graph Analysis which speculates on all the bad things one could do if you had a publicly accessible social graph (e.g. find which people in your service have lots of “friends” with bad credit, low income, criminal history, history of political dissension, poor health, etc so you can discriminate or target them accordingly) especially if you can tie some of the hashed information back to real data which should be quite possible for some subset of the people in the graph.  

Now playing: Fergie - Big Girls Don't Cry


 

Categories: Social Software