Recently someone at work asked me if I thought social networking on the Web was a fad. My response was that it depends on what you mean by social networking since lots of Web applications are lumped into that category but either way I think all of these different categories are here to stay.

I thought it would be useful to throw out a couple of definitions so that we all had a shared vocabulary when talking about the different types of Web applications that incorporate some form of social networking. A number of these terms were popularized by Mark Zuckerberg which is a testament to the way Facebook has cornered the thought leadership in this space.

  1. Social Graph: If one were to render the various ways different people in a particular community were connected into a data structure, it would be a graph.  In a social graph, each person is a vertex and each relationship connecting two people is an edge. There can be multiple edges connecting people (e.g. Mike and I work at Microsoft, Mike and I are IM buddies, Mike and I live in Washington state, etc). Edges in the social graph have a label which describes the relationship. Fun examples of social graphs are slander & libel -- the official computer scene sexchart and the Mark Foley Blame chart.

  2. Social Graph Application: An application that requires or is improved by the creation of a social graph describing the context specific relationships between it’s users is a social graph application. Examples of applications that require a social graph to actually be usable are instant messaging applications like Skype and Windows Live Messenger.  Examples of applications that are significantly improved by the existence of context specific social graphs within them are Digg, Flickr, Del.icio.us and Twitter which all don’t require a user to add themselves to the site’s social graph to utilize it’s services but get more valuable once users do. One problem with the latter category of sites is that they may require a critical mass of users to populate their social graph before they become compelling.

    Where Facebook has hit the jack pot is that they have built a platform where applications that are compelling once they have a critical mass of users can feed off of Facebook’s social graph instead of trying to build a user base from scratch. Contrast the struggling iLike website with the hugely successful iLike Facebook application.

  3. Social Networking Site: These are a subset of social graph applications. danah boyd has a great definition of social networking sites on her blog which I’ll crib in it’s entirety; A "social network site" is a category of websites with profiles, semi-persistent public commentary on the profile, and a traversable publicly articulated social network displayed in relation to the profile. Popular examples of such websites are MySpace and Bebo. You can consider these sites to be the next step in the evolution of a personal homepage which now incorporate richer media, more avenues for self expression and more interactivity than our GeoCities and Tripod pages of old.

  4. Social Operating System: These are a subset of social networking sites. In fact, the only application in this category today is Facebook.  Before you use your computer, you have to boot your operating system and every interaction with your PC goes through the OS. However instead of interacting directly with the OS, most of the time you interact with applications written on top of the OS. Similarly a Social OS is the primary application you use for interacting with your social circles on the Web. All your social interactions whether they be hanging out, chatting, playing games, watching movies, listening to music, engaging in private gossip or public conversations occurs within this context. This flexibilty is enabled by the fact that the Social OS is a platform that enables one to build various social graph applications on top of it.

By the way, on revisiting my schedule I do believe I should be able to attend the Data Sharing Summit on Friday next week. I'll only be in the area for that day but it should be fun to chat with folks from various companies working in this space and get to share ideas about how we can all work together to make the Web a better place for our users.

Now playing: Snoop Doggy Dogg - That's That Shit (feat. R. Kelly)


 

Categories: Social Software

There's an article on InfoQ entitled "Code First" Web Services Reconsidered which begins

Are you getting started on developing SOAP web services? If you are, you have two development styles you can chose between. This first is called “start-from-WSDL”, or “contract first”, and involves building a WSDL service description and associated XML schema for data exchange directly. The second is called “start-from-code”, or “code first”, and involves plugging sample service code into your framework of choice and generating the WSDL+schema from that code.

With either development style, the end goal is the same – you want a stable WSDL+schema definition of your service. This goal is especially important when you’re working in a SOA environment. SOA demands loose coupling of services, where the interface is fixed and separate from the implementation.

There are lots of problems with this article, the main one being that choosing between “contract first” and “code first” SOAP Web Services development styles is like choosing between putting a square peg in a round hole and putting a round peg in a square hole. Either way, you will have to deal with the impedance mismatch between W3C XML Schema (XSD) and objects from your typical OO system. This is because practically every SOAP Web service toolkit does some level of XML<->object mapping which ends up being lossy because W3C XML Schema contains several constructs that don’t really map well to objects and objects may contain constructs that don’t really map well to W3C XML Schema depending on your platform choice. 

The only real consideration when deciding between “code first” and “contract first” approaches is whether your service is concerned primarily with objects or whether it is concerned primarily with XML documents [preferrably with a predefined schema]. If you are just moving around data objects (i.e. your data model isn’t much more complex than JSON) then you are better off using a “code first” approach especially since most SOAP toolkits can handle the basic constructs that would result from generating WSDLs from such types. On the other hand, if you are transmitting documents that have a predefined schema (e.g. XBRL or OOXML documents) then you are better off authoring the WSDL by hand than trying to jump through hoops to get the XML<->object mapping technology in your off-the-shelf SOAP toolkit to do a great job with these schemas. Unfortunately, anyone consuming your SOAP Web service who is using an off-the-shelf SOAP toolkit will likely have interoperability issues when their toolkit tries to process schemas that fully utilize a significant number of the features W3C XML Schema. If your situation falls somewhere in the middle, then you’re probably screwed regardless of which approach you choose.

One of the interesting consequences of the adoption of RESTful Web services is that these interoperability issues have been avoided because most people who provide these services do not provide schemas in the W3C XML Schema. Thus discouraging the foolishness of XML<->object mapping based on XSD that causes interoperability problems in the SOAP world. Instead, most vendors who expose RESTful APIs that want to provide an object-centric programming model for developers who don’t want to deal with XML either create or encourage the third party creation of client libraries on target platforms which wrap their simple, RESTful Web services with a more object oriented facade. Examples of vendors who have gone with this approach for their RESTful APIs include

That way XML geeks like me get to party on the raw XML which is in a simple and straightforward format while the folks who are scared of XML can party on objects using their favorite programming language and platform. The best of both worlds.

“Contract first” vs. “Code first”? More like do you want to slam your head into a brick wall or a concrete wall?

Now playing: Big Boi - Kryptonite (I'm On It) feat. Killer Mike


 

Categories: XML Web Services

The Facebook developer blog has a post entitled Change is Coming which details some of the changes they've made to the platform to handle malicious applications including

Requests

We will be deprecating the notifications.sendRequest API method. In its place, we will provide a standard invitation tool that allows users to select which friends they would like to send a request to. We are working hard on multiple versions of this tool to fit into different contexts. The tool will not have a "select all" button, but we hope it enables us to increase the maximum number of requests that can be sent out by a user. The standardized UI will hopefully make it easier for users to understand exactly what they are doing, and will save you the trouble of building it yourself.

Notifications

Soon we will be removing email functionality from notifications.send, though the API function itself will remain active. In the future, we may provide another way to contact users who have added your app, as we know that is important. Deceptive and misleading notifications will continue to be a focus for us, and we will continue to block applications which behave badly and we will continue to iterate on our automated spam detection tools. You will also see us working on ways to automatically block deceptive notifications.

It looks like some but not all of the most egregious behavior is being targetted which is good. Specifically, I  wonder what is meant by deprecating the notifications.sendRequest API. When I think of API deprecation, I think of @deprecated in Java and Obsolete in C#, neither of which prevent the API from being used.

One of my biggest gripes with the site is the number of “friend requests” I get from applications with no way to opt out of getting these requests. However it doesn’t seem that this has been eliminated. Instead an API is being replaced with a UI component but the API isn’t even going away. I hope there is a follow up post where they describe the opt-out options they’ve added to the site so users can opt-out of getting so many unsolicited requests.

Now playing: Big Pun - Punish Me


 

August 28, 2007
@ 04:24 PM

People Who Need to Get the Fuck out of the White House
Donald Rumsfeld
Karl Rove
Alberto Gonzales
Dick Cheney
G.W. Bush

Now playing: Al Green - Tired Of Being Alone


 

Categories: Current Affairs

Robert Scoble has a blog post up entitled Why Mahalo, TechMeme, and Facebook are going to kick Google’s butt in four years where he argues that search based on social graphs (e.g. your Facebook relationships) or generated by humans (e.g. Mahalo) will eventually trump Google's algorithms. I'm not sure I'd predict the demise of Google but I do agree that the social graph can be used to improve search and other aspects of the Internet experience, in fact I agree so much that was the topic of my second ThinkWeek paper which I submitted earlier this year (Microsoft folks can find it here).

However I don’t think Google’s main threat from sites like Facebook is that they may one day build social graph powered search that beats Google’s algorithms. Instead it is that these sites are in direct conflict with Google’s mission to

organize the world's information and make it universally accessible and useful.

because they create lots of valuable content that Google can not access. Google has branched out of Web search into desktop search, enterprise search, Web-based email and enterprise application hosting all to fulfill this mission.

The problem that Google faces with Facebook is pointed out quite well in Jason Kottke’s post Facebook vs. AOL, redux where he writes

Think of it this way. Facebook is an intranet for you and your friends that just happens to be accessible without a VPN. If you're not a Facebook user, you can't do anything with the site...nearly everything published by their users is private. Google doesn't index any user-created information on Facebook.2

and in Jeff Atwood's post Avoiding Walled Gardens on the Internet which contains the following excerpt

I occasionally get requests to join private social networking sites, like LinkedIn or Facebook. I always politely decline…public services on the web, such as blogs, twitter, flickr, and so forth, are what we should invest our time in. And because it's public, we can leverage the immense power of internet search to tie it all-- and each other-- together.

What Jason and Jeff are inadvertantly pointing out is that once you join Facebook, you immediately start getting less value out of Google’s search engine. This is a problem that Google cannot let continue indefinitely if they plan to stay relevant as the Web’s #1 search engine.

What is also interesting is that thanks to efforts of Google employees like Mark Lucovsky, I can use Google search from within Facebook but without divine intervention I can’t get Facebook content from Google’s search engine. If I was an exec at Google, I’d worry a lot more about the growing trend of users creating Web content where it cannot be accessed by Google than all the “me too” efforts coming out of competitors like Microsoft and Yahoo!.

The way you get disrupted is by focusing on competitors who are just like you instead of actually watching the marketplace. I wonder how Google will react when they eventually realize how deep this problem runs?

Now playing: Metallica - Welcome Home (Sanitarium)


 

I try to avoid posting about TechMeme pile ups but this one was just too irritating to let pass. Mark Cuban has a blog post entitled The Internet is Dead and Boring which contains the following excerpts 

Some of you may not want to admit it, but that's exactly what the net has become. A utility. It has stopped evolving. Your Internet experience today is not much different than it was 5 years ago.
...
Some people have tried to make the point that Web 2.0 is proof that the Internet is evolving. Actually it is the exact opposite. Web 2.0 is proof that the Internet has stopped evolving and stabilized as a platform. Its very very difficult to develop applications on a platform that is ever changing. Things stop working in that environment. Internet 1.0 wasn't the most stable development environment. To days Internet is stable specifically because its now boring.(easy to avoid browser and script differences excluded)

Applications like Myspace, Facebook, Youtube, etc were able to explode in popularity because they worked. No one had to worry about their ISP making a change and things not working. The days of walled gardens like AOL, Prodigy and others were gone.
...
The days of the Internet creating explosively exciting ideas are dead. They are dead until bandwidth throughput to the home reaches far higher numbers than the vast majority of broadband users get today.
...
So, let me repeat, The days of the Internet creating explosively exciting ideas are dead for the foreseeable future..

I agree with Mark Cuban that the fundamental technologies that underly the Internet (DNS and TCP/IP) and the Web in particular (HTTP and HTML) are quite stable and are unlikely to undergo any radical changes anytime soon. If you are a fan of Internet infrastructure then the current world is quite boring because we aren't likely to ever see an Internet based on IPv8 or a Web based on HTTP 3.0. In addition, it is clear that the relative stability in the Web development environment and increase in the number of people with high bandwidth connections is what has led a number of the trends that are collectively grouped as "Web 2.0".

However Mark Cuban goes off the rails when he confuses his vision of the future of media as the only explosive exciting ideas that can be enabled by a global network like the Internet. Mark Cuban is an investor in HDNet which is a company that creates and distributes professionally produced content in high definition video formats. Mark would love nothing more than to distribute his content over the Internet especially since lack of interest in HDNet in the cable TV universe (I couldn't find any cable company on the Where to Watch HDNet page that actually carried the channel).

Unfortunately, Mark Cuban's vision of distributing high definition video over the Internet has two problems. The first is the fact that distributing high quality video of the Web is too expensive and the bandwidth of the average Web user is insufficient to make the user experience pleasant. The second is that people on the Web have already spoken and content trumps media quality any day of the week. Remember when pundits used to claim that consumers wouldn't choose lossy, compressed audio on the Web over lossless music formats? I guess no one brings that up anymore given the success of the MP3 format and the iPod. Mark Cuban is repeating the same mistake with his HDNet misadventure.  User generated, poor quality video on sites like YouTube and larger library of content on sites like Netflix: Instant Viewing is going to trump the limited line up on services like HDNet regardless of how much higher definition the video quality gets.

Mark Cuban has bet on a losing horse and he doesn't realize it yet. The world has changed on him and he's still trying to work within an expired paradigm. It's like a newspaper magnate blaming printer manufacturers for not making it easy to print a newspaper off of the Web instead of coming to grips with the fact that the Internet with its blogging/social media/user generated content/craig's list and all that other malarky has turned his industry on its head.

This is what it looks like when a billionairre has made a bad investment and doesn't know how to recognize the smell of failure blasting his nostrils with its pungent aroma.


 

Categories: Current Affairs | Technology

August 26, 2007
@ 11:10 PM

Over the last few months, there have been numerous articles by various bloggers and mainstream press speculating on whether use of Facebook will supplant email. I have also noticed that I use the private messaging feature in Facebook to keep in touch with more people, more often than I do with non-work related email.

I used to think that this was a welcome development from a user's point of view because spam is pretty much eliminated due to in-built white lists based on social networks. Or at least that's what I thought. However over the past few weeks I've been getting more and more unsolicited private messages on the site which aren't p3n1s enlargement or 419 scams but are still unsolicited. On taking a look at the privacy options, it turns out that there doesn't seem to be a way to opt out of being contacted by random people on the site. 

In fact, I'm surprised that regular spammers haven't yet flooded Facebook given how they seem to end up everywhere else you let people contact each other directly over the Web. One thing I find confusing is that I can swear that there was an option to opt out of messages from people who aren't on your friend's list or in one of your Networks. Or was this just my imagination?

The other kind of unsolicited mail that is totally wrecking my Facebook experience is unsolicited friend requests from Facebook applications. These aren't just regular friend requests. It seems that every application a user adds can make friend requests specific to the application. I'm getting friend requests from My Questions and Likeness on an almost daily basis with no way to permanently ignore friend requests from these applications.

I guess Clay Shirky's old saying is true, the definition of social software is stuff that gets spammed.


 

Categories: Social Software

August 24, 2007
@ 06:44 PM

Recently, my status message on Facebook was I'm now convinced microformats are a stupid idea. Shortly afterwards I got a private message from Scott Beaudreau asking me to clarify my statement. On reflection, I realize that what I find stupid is when people suggest using microformats and screen scraping techiques instead of utilizing an API when the situation calls for one. For example, the social network portability proposal on the microformats wiki states

The "How To" for social network profile sites that want to solve the above problems and achieve the above goals.

  1. Publish microformats in your user profiles:
    1. implement hCard on user profile pages. See hcard-supporting-profiles for sites that have already done this.
    2. implement hCard+XFN on the list of friends on your user profile pages. See hcard-xfn-supporting-friends-lists for sites that already do this. (e.g. [Twitter (http://twitter.com/)]).
  2. Subscribe to microformats for your user profiles:
    1. when signing up a new user:
      1. let a user fill out and "auto-sync" from one of their existing hcard-supporting-profiles, their name, their icon etc. Satisfaction Inc already supports this. (http://microformats.org/blog/2007/06/21/microformatsorg-turns-2/)
      2. let a user fill out and "auto-sync" their list of friends from one of their existing hCard+XFN supporting friends lists. Dopplr.com already supports this.

It boggles my mind to see the suggestion that applications should poll HTML pages to do data synchronization instead of utilizing an API. Instead of calling friends.get, why don't we just grab the entire friends page then parse out the handful of useful data that we actually need.

There are a number of places microformats are a good fit, especially in situations where the client application has to parse the entire HTML document anyway. Examples include using hCard to enable features like Live Clipboard or using hReview and hCalendar to help microformat search engines. However using them as a replacement for an API or an RSS feed is like using boxing gloves instead of oven mitts when baking a pie.

If all you have is a hammer, everything looks like a nail.

Now playing: D12 - American Psycho II (feat. B-Real)


 

David Berlind has a blog post entitled If ‘you’ build OpenID, will ‘they’ come? where he writes

In case you missed it last week, Microsoft is taking another swing at the idea of single sign-on technologies. Its first, Passport, failed miserably. Called Windows Live ID (following in the footsteps of everything else “Windows Live”), I guess you could call this “Son of Passport” or “Passport: The Sequel.” The question is (for Microsoft as much as anyone else), down the road, will we have “Passport The Thirteenth”?

When I saw the announcement, the first thought that went through my mind was whether or not Microsoft’s WLID service would also “double” as an OpenID node. OpenID is another single sign-on specification that has been gaining traction in open circles (no suprise there) and the number of OpenID nodes (providers of OpenID-based authentication) is growing.

In light of the WLID announcement from Microsoft and given the discussions that the Redmond company’s chief identity architect Kim Cameron and I have had (see After Passport, Microsoft is rethinking identity) about where Microsoft has to go to be more of an open player on the identity front, I tried to track him down to get an update on why WLID and OpenID don’t appear to be interoperable (I could be wrong on this).

Somewhere along the line, people have gotten the mistaken impression that the Windows Live ID Web Authentication SDK is about single sign-on. It isn’t. The primary reason for opening up our authentication system is to let non-Microsoft sites build and host widgets that access a user’s data stored within Windows Live or MSN services. This is spelled out in the recent blog posting about the release on the Windows Live ID team blog which is excerpted below

The benefits of incorporating Windows Live ID into your Web site include:

 

·         The ability to use Windows Live gadgets, APIs and controls to incorporate authenticated Windows Live services into your site.

For example, the recently announced collaboration between Windows Live and Bebo requires a way for Windows Live users on Bebo to authenticate themselves and utilize Windows Live services from the Bebo site. That’s what the Windows Live ID Web Authentication SDK is meant to enable.

Although the technological approaches are similar, the goal is completely different from that of OpenID which is meant to be a single sign-on system. 

Now playing: Mase - Return Of The Murda


 

Categories: Windows Live

This morning there were a number of news stories about collaboration between Windows Live and Bebo. These news stories didn’t tell the whole story. Articles such as C|Net’s Bebo's new instant messaging is Microsoft-flavored and TechCrunch’s Windows Live Messaging Comes to Bebo give the impression that the announcement was about instant messaging. However there was much more to the announcement. The agreement between Windows Live and Bebo spans two areas; social network portability and interop between Web-based IM and Windows Live Messenger

  1. Social Network Portability: As I’ve mentioned before a common practice among social networking sites is to ask users for their log-in credentials for their email accounts so that the social networking sites can screen scrape the HTML for the address book and import the user’s contact list into the social networking site. There are a number of problems with this approach, the main one being that the user is simply moving data from one silo to another without being able to get their contact list back from the social network and into their email client. There’s also the problem that this approach makes users more susceptible to phishing since it encourages them to enter their log-in credentials on random sites.  Finally, the user isn’t in control of how much data is pulled from their address book by the social network or how often it is pulled.

    The agreement between Windows Live and Bebo enables users to utilize a single contact list across both sites. Their friends in Bebo will be available as their contacts in Windows Live and vice versa. This integration will be facilitated by the Windows Live Contacts API which implements a user-centric access control model where the user grants applications permission to access and otherwise manipulate their contact list.

  2. Web-based IM and Windows Live Messenger interoperability: Users of Bebo that are also Windows Live Messenger users can opt in to getting notifications from Bebo as alerts in their desktop IM client. In addition, these users can add an “IM Me” button to their profile which allows people browsing their profile on the Web to initiate an IM conversation with them using a Microsoft-provided Web IM widget on the Bebo website which communicates with the Windows Live Messenger client on the profile owner’s desktop.

    The above scenarios were demoed at this year's MIX '07 conference during the session Broaden Your Market with Windows Live. The current plan is for the APIs for interacting with the Windows Live Messenger service and the IM widgets that can be embedded within a non-Microsoft website that power this scenario to be available via http://dev.live.com in the near future.

At the end of the day, it is all about putting users in control. We don’t believe that a user’s social graph should be trapped in a roach motel of our creation. Instead users should be able to export their contact lists from our service on their own terms and should be able to grow their social graph within Windows Live without having to exclusively use our services.

It’s your data, not ours. If you want it, you can have it. Hopefully, the rest of the industry comes around to this sort of thinking sooner rather than later.

Stay tuned, there’s more to come.

Now playing: Gucci Mane - So Icy (feat. Young Jeezy)


 

I just spent an hour doing some research in response to Sam Ruby's post Sousveillance where he wonders whether some of the descriptions of Facebook as a social graph roach motel (i.e. information about your relationships goes in, nothing comes out) is accurate. Sam writes

Dare seems to think that the root problem is oppression by the “man”.  In this case, a 23 year old.  Brad seems to view this as a technical problem.

I wonder what I wrote that gave that impression especially in the linked post. In that post, I was simply giving some advise about the kind of social problems you will face when treating unifying social graphs across different contexts and applications as a technical problem. If anyone is whining about oppression by Facebook, it would be Brad’s original manifesto which mentions the site by name over a dozen times.

Data point 1: one day when logging onto Facebook, I saw an offer to scan my AIM contacts and invite the ones that had Facebooks to be friends.  I unselected a few, and then clicked on submit.  Within hours, my network expanded greatly.  IM ids serve as useful foreign keys.

Like lots of popular social networking services, but not Windows Live Spaces, Facebook is fond of violating the terms of use of various email providers by screen scraping user address books and contact lists after collecting their log-in credentials.

However Facebook prevents this from being done to them by only showing email addresses as images which expire after a couple of minutes due to use of session keys. I once considered writing an application to import my Facebook contacts into Outlook but gave up once I realized I couldn’t find any free, off-the-shelf OCR APIs that I could use.

I did find an article on CodeProject about rolling your own OCR via neural networks which seems promising but I don't have the free time to mess with that right now. Maybe later in the year. Sam also writes

Data point 2: Facebook is a platform with an API.  If there is a need, it seems to me that one could develop an application using FQL to pull one’s friend list out of Facebook and share it externally.  The fact that I don’t know of such an application means one of four things is happening: (1) it exists, but I don’t know about it, (2) despite the alleged overwhelming demand for this feature, and obvious commercial opportunities it opens up, it hadn’t occurred to anyone, (3) I’m reading the documentation wrong, and it isn’t possible for applications to obtain access to one’s own Facebook ID for use as a foreign key, or (4) the demand simply isn’t there.

Or (5) the information returned by FQL about a user contains no contact information (no email address, no IM screen names, no telephone numbers, no street address)  so it is pretty useless as a way to utilize one’s friends list with applications besides Facebook since there is no way to cross-reference your friends using any personally identifiable association that would exist in another service.

When it comes to contact lists (i.e. the social graph), Facebook  is a roach motel. Lots of information about user relationships goes in but there’s no way for users or applications to get it out easily. Whenever an application like FacebookSync comes along which helps users do this, it is quickly shut down for violating their Terms of Use. Hypocrisy? Indeed.

Now playing: Lil Boosie & Webbie - Wipe Me Down (remix) (feat. Jim Jones, Fat Joe, Jadakiss & Foxx)


 

Categories: Social Software

I just read the post on the Skype weblog entitled What happened on August 16 about the cause of their outage which states

On Thursday, 16th August 2007, the Skype peer-to-peer network became unstable and suffered a critical disruption. The disruption was triggered by a massive restart of our users’ computers across the globe within a very short time frame as they re-booted after receiving a routine set of patches through Windows Update.

The high number of restarts affected Skype’s network resources. This caused a flood of log-in requests, which, combined with the lack of peer-to-peer network resources, prompted a chain reaction that had a critical impact.

Normally Skype’s peer-to-peer network has an inbuilt ability to self-heal, however, this event revealed a previously unseen software bug within the network resource allocation algorithm which prevented the self-healing function from working quickly.

This problem affects all networks that handle massive numbers of concurrent user connections whether they are peer-to-peer or centralized. When you deal with tens of millions of users logged in concurrently and something causes a huge chunk of them to log-in at once (e.g. after an outage or a synchronized computer reboot due to operating system patches) then your system will be flooded with log-in requests. All the major IM networks (including Windows Live) have all sorts of safeguards in place within the system to prevent this from taking down their networks although how many short outages are due to this specific issue is anybody’s guess.

However Skype has an additional problem when such events happen due to it’s peer-to-peer model which is described in the blog post All Peer-to-Peer Models Are NOT Created Equal -- Skype's Outage Does Not Impugn All Peer-to-Peer Models 

According to Aron, like its predecessor Kazaa, Skype uses a different type of Peer-To-Peer network than most companies. Skype uses a system called SuperNodes. A SuperNode Peer-to-Peer system is one in which you rely on your customers rather than your own servers to handle the majority of your traffic. SuperNodes are just normal computers which get promoted by the Skype software to serve as the traffic cops for their entire network. In theory this is a good idea, but the problem happens if your network starts to destabilize. Skype, as a company, has no physical or programmatic control over the most vital piece of its product. Skype instead is at the mercy of and vulnerable to the people who unknowingly run the SuperNodes.

This of course exposes vulnerabilities to any business based on such a system -- systems that, in effect, are not within the company's control.

According to Aron, another flaw with SuperNode models concerns system recovery after a crash. Because Skype lost its SuperNodes in the initial crash, its network can only recover as fast as new SuperNodes can be identified.

This design leads to a virtuous cycle when it comes to recovering from an outage. With most of the computers on the network being rebooted, they lost a bunch of SuperNodes and so when the computers came back online they flooded the remaining SuperNodes which in turn went down and so on…

All of this is pretty understandable. What I don’t understand is why this problem is just surfacing. After all, this isn’t the first patch Tuesday. Was the bug in their network resource allocation process introduced in a recent version of Skype? Has the service been straining for months and last week was just the tipping point? Is this only half the story and there is more they aren’t telling us?

Hmmm… 

Now playing: Shop Boyz - Party Like A Rockstar (remix) (feat. Lil' Wayne, Jim Jones & Chamillionaire)


 

Categories: Technology

My job at Microsoft is working on the contacts platform that is utilized by a number of Windows Live services. The contacts platform is a unified graph of the relationships our users have created across Windows Live. It includes a user's Windows Live Hotmail contacts, their Windows Live Spaces friends, their Windows Live Messenger buddies and anyone they've added to an access control list (e.g. people who can access their shared folders in Windows Live Skydrive or the events in their calendar). Basically, a while ago one of our execs thought it didn't make sense to build a bunch of social software applications each acting as a silo of user relationships and that instead we should have a unified graph of the user to user relationships within Windows Live. Fast forward a couple of years and we now have a clearer idea of the pros and cons of building a unified social graph.

Given the above, it should be no surprise that I read Brad Fitzpatrick's Thoughts on the Social Graph with keen interest since it overlaps significantly with my day job. I was particularly interested in the outlined goals for the developers API which are included below

For developers who don't want to do their own graph analysis from the raw data, the following high-level APIs should be provided: 

  1. Node Equivalence, given a single node, say "brad on LiveJournal", return all equivalent nodes: "brad" on LiveJournal, "bradfitz" on Vox, and 4caa1d6f6203d21705a00a7aca86203e82a9cf7a (my FOAF mbox_sha1sum). See the slides for more info.
  2. Edges out and in, by node. Find all outgoing edges (where edges are equivalence claims, equivalence truths, friends, recommendations, etc). Also find all incoming edges.
  3. Find all of a node's aggregate friends from all equivalent nodes, expand all those friends' equivalent nodes, and then filter on destination node type. This combines steps 1 and 2 and 1 in one call. For instance, Given 'brad' on LJ, return me all of Brad's friends, from all of his equivalent nodes, if those [friend] nodes are either 'mbox_sha1sum' or 'Twitter' nodes.
  4. Find missing friends of a node. Given a node, expand all equivalent nodes, find aggregate friends, expand them, and then report any missing edges. This is the "let the user sync their social networking sites" API. It lets them know if they were friends with somebody on Friendster and they didn't know they were both friends on MySpace, they might want to be.

here are the top three problems Brad and the rest of the Google folks working on this project will have to factor in as they chase the utopia that is a unified social graph.
  1. Some Parts of the Graph are Private: Although social networking sites with publicly articulated social networks are quite popular (e.g. MySpace) there are a larger number of private or semi-private social networks that either can only be viewed by the owner of the list (e.g. IM buddy lists) or some subset of the graph (e.g. private profiles on social networking sites MySpace, Facebook, Windows Live Spaces, etc). The latter is especially tricky to deal with. In addition, people often have more non-public articulated social networks (i.e. friends lists) than public ones despite the popularity of social networking sites with public profiles.

  2. Inadvertent Information Disclosure caused by Linking Nodes Across Social Networks: The "find missing friends of a node" feature in Brad's list sounds nice in theory but it includes a number of issues that users often consider to be privacy violations or just plain creepy. Let's say, I have Batman on my friend's list in MySpace because I think the caped crusader is cool. Then I join LiveJournal and it calls the find_missing_friends() API to identify which of my friends from other sites are using LiveJournal and it find's Bruce Wayne's LiveJournal? Oops, an API call just revealed Batman's secret identity. A less theoretical version of this problem occurred when we first integrated Windows Live Spaces with Windows Live Messenger, and some of our Japanese beta users were stunned to find that their supposedly anonymous blog postings were now a click away for their IM buddies to see. I described this situation briefly in my submission to the 2005 Social Computing Symposium.

  3. All "Friends" aren't Created Equal: Another problem is that most users don't want all their "friends" available in all their applications. One capability we were quite proud off at one time is that if you had Pocket MSN Messenger then we merged the contacts on your cell phone with your IM and email contacts. A lot of people were less than impressed by this behavior. Someone you have on your IM buddy list isn't necessarily someone you want in your cell phone address book. Over the years, I've seen more examples of this than I can count. Being "friends" in one application does not automatically mean that two users want to be "friends" in a completely different context.

These are the kinds of problems we've had to deal with on my team while also trying to make this scale to being accessed by services utilized by hundreds of millions of users. I've seen what it takes to build a system like this first hand and Brad & company have their work cut out for them. This is without considering the fact that they may have to deal with ticked of users or ticked off social networking sites depending on how exactly they plan to build this giant database of user friend lists.

PS: In case any of this sounds interesting to you, we're always hiring. :)


 

Categories: Platforms | Social Software

Brad Fitzpatrick, the founder of LiveJournal, who recently left Six Apart for Google has published notes on what he's going to be working on moving forward. It is an interesting read entitled Brad's Thoughts on the Social Graph which contains the following excerpts

Currently if you're a new site that needs the social graph (e.g. dopplr.com) to provide one fun & useful feature (e.g. where are your friends traveling and when?), then you face a much bigger problem then just implementing your main feature. You also have to have usernames, passwords (or hopefully you use OpenID instead), a way to invite friends, add/remove friends, and the list goes on. So generally you have to ask for email addresses too, requiring you to send out address verification emails, etc. Then lost username/password emails. etc, etc. If I had to declare the problem statement succinctly, it'd be: People are getting sick of registering and re-declaring their friends on every site., but also: Developing "Social Applications" is too much work.

Facebook's answer seems to be that the world should just all be Facebook apps.
...
Goals:
1. Ultimately make the social graph a community asset, utilizing the data from all the different sites, but not depending on any company or organization as "the" central graph owner. 
  1. Establish a non-profit and open source software (with copyrights held by the non-profit) which collects, merges, and redistributes the graphs from all other social network sites into one global aggregated graph. This is then made available to other sites (or users) via both public APIs (for small/casual users) and downloadable data dumps, with an update stream / APIs, to get iterative updates to the graph (for larger users)
...
Non-Goals:
  1. The goal is not to replace Facebook. In fact, most people I've talked to love Facebook, just want a bit more of their already-public data to be more easily accessible, and want to mitigate site owners' fears about any single data/platform lock-in. Early talks with Facebook about participating in this project have been incredibly promising. 

It seems to me that Facebook is the new Microsoft in that there are now a significant amount of people who are either upset at the level of "lock-in" they have created or are just plain jealous of their "wealth" who have created dedicated efforts to break their hegemony. It'll be interesting watching this play out.

From my perspective, I'm skeptical of a lot of the talk about social network portability because the conversation rarely seems to be user centric. Usually it's creators of competing services who are angry about "lock-in" because they can't get a new user's contacts from another service and spam them to gain "viral growth" for their service. As for the various claims of social network overload only the power users and geeks who join a new social network service a month (WTF is Dopplr?) have this problem.

A real social network is a community and users don't change communities at the drop of a hat. What I find more interesting is being able to bridge these communities instead of worrying about the 1% of users who hop from community to community like crack addled humming birds skipping from flower to flower.

I'll put it this way, when it comes to email which is more important? The ability to send emails to people regardless of what email service or mail client they use or the ability to import your contact list from one free email service into another when you switch service providers?


 

I learned about the Facebook Data Store API yesterday from a post by Marc Canter. The API is intended to meet the storage needs of developers building widgets applications on the Facebook widget platform. Before we decide if the API meets the needs of developers, we need to list what these needs are in the first place. A developer building a widget or application for a social network’s widget platform such as a gadget for Windows Live Spaces or an application for the Facebook platform needs to store

  1. Static resources that will be consumed or executed on the client such as images, stylesheets and script files. Microsoft provides this kind of hosting for gadget developers via Windows Live Gallery. This is all the storage needed for a gadget such as GMT clock.
  2. User preferences and settings related to the gadget. In many cases, a gadget may provide a personalized view of data (e.g. my Netflix queue or the local weather) or may simply have configuration options specific to the user which need to be saved. Microsoft provides APIs for getting, setting and deleting preferences as part of it’s Web gadgets framework. My Flickr badge gadget is an example of the kind of gadget that requires this level of storage.
  3. The application’s server-side code and application specific databases. This is the equivalent of the LAMP or WISC hosting you get from a typical Web hosting provider. No social networking site provides this for widget/gadget developers today. The iLike Facebook application is an example of the kind of application that requires this level of “storage” or at this level it should probably be called app hosting.

Now that we have an idea of the data storage needs of Web widget/gadget developers, we can now discuss how the Facebook Data Store API measures up. The API consists of three broad classes of methods; User Preferences, Persistent Objects and Associations. All methods can return results as XML, JSON or JSONP.

It is currently unclear if the API is intended to be RESTful or not since there is scant documentation of the wire format of requests or responses. 

User Preferences

Object Definition methods
* data.setUserPreference update one preference
* data.setUserPreferences update multiple preferences in batch
* data.getUserPreference get one preference of a user
* data.getUserPreferences get all preferences of a user

These methods are used to store key value pairs which may represent user preferences or settings for an application. There is a limit of 201 key<->value pairs which can be stored per user. The keys are numeric values from 0 – 200 and the maximum length of a preference value is 128 characters. 

Persistent Objects

Object Definition methods
* data.createObjectType create a new object type
* data.dropObjectType delete an object type and all objects of this type
* data.renameObjectType rename an object type
* data.defineObjectProperty add a new property
* data.undefineObjectProperty remove a previously defined property
* data.renameObjectProperty rename a previously defined property
* data.getObjectTypes get a list of all defined object types
* data.getObjectType get detailed definition of an object type

Developers can create new types which are analogous to SQL tables especially when you consider terminology like “drop” object, the ability to add new properties/columns to the type and being able to retrieve the schema  of the type which are all more common in relational database world than in object oriented programming.

 

Object Manipulation methods
* data.createObject create a new object
* data.updateObject update an object's properties
* data.deleteObject delete an object by its id
* data.deleteObjects delete multiple objects by ids
* data.getObject get an object's properties by its id
* data.getObjects get properties of a list of objects by ids
* data.getObjectProperty get an object's one property
* data.setObjectProperty set an object's one property
* data.getHashValue get a property value by a hash key
* data.setHashValue set a property value by a hash key
* data.incHashValue increment/decrement a property valye by a hash key
* data.removeHashKey delete an object by its hash key
* data.removeHashKeys delete multiple objects by their hash keys

This aspect of the API is almost self explanatory, you create an object type (e.g. a movie) then manipulate instances of this object using the above APIs. Each object can be accessed via a numeric ID or a string hash value. The object’s numeric ID is obtained when you first create the object although it isn’t clear how you obtain an object’s hash key. It also seems like there is no generic query mechanism so you need to store the numeric IDs or hash keys of the objects you are interested in somewhere so you don’t have to enumerate all objects looking for them later. Perhaps with the preferences API?

Associations

Association Definition methods
* data.defineAssociation create a new object association
* data.undefineAssociation remove a previously defined association and all its data
* data.renameAssociation rename a previously defined association
* data.getAssociationDefinition get definition of a previously defined association
* data.getAssociationDefinitions get definitions of all previously defined associations

An association is a named relationship between two objects. For example, "works_with" could be an association between two user objects. Associations don't have to be between the same types (e.g. a "works_at" could be an association between a user object and a company object). Associations take me back to WinFS and son of WinFS Entity Data Model which has a notion of a RelationshipType that is very similar to the above notion of an association. It is also similar to the notion of an RDF triple but not quite.

Association Manipulation methods
* data.setAssociation create an association between two objects
* data.setAssociations create a list of associations between pairs of objects
* data.removeAssociation remove an association between two objects
* data.removeAssociations remove associations between pairs of objects
* data.removeAssociatedObjects remove all associations of an object
* data.getAssociatedObjects get ids of an object's associated objects
* data.getAssociatedObjectCount get count of an object's associated objects
* data.getAssociatedObjectCounts get counts of associated objects of a list of objects.
* data.getAssociations get all associations between two objects

All of these methods should be self explanatory. Although I think this association stuff is pretty sweet, I’m unclear as to where all of this is expected to fall in the hierarchy of needs of an Facebook application. The preferences stuff is a no brainer. The persistent object and association APIs could be treated as a very rich preferences API by developers but this doesn’t seem to be living up to their potential. On the other hand, without providing something closer to an app hosting platform like Amazon has done with EC2 + S3, I’m not sure there is any other use for them by Web developers using the Facebook platform.

Have I missed something here?

Now playing: UGK - International Players Anthem (feat. Outkast)


 

Categories: Platforms

If you go to http://dev.live.com/liveid you’ll see links to Windows Live ID for Web Authentication and Client Authentication which enable developers to build Web or desktop applications that can be used to authenticate users via Windows Live ID. The desktop SDK are still in alpha but the Web APIs have hit v1. You can get the details from the Windows Live ID team blog post entitled Windows Live ID Web Authentication SDK for Developers Is Released which states  

Windows Live ID Web Authentication allows sites who want to integrate with the Windows Live services and platform. We are releasing a set of tools that make this integration easier than ever.  

Web Authentication works by sending your users to the Windows Live ID sign-in page by means of a specially formatted link. The service then directs them back to your Web site along with a unique, site-specific identifier that you can use to manage personalized content, assign user rights, and perform other tasks for the authenticated user. Sign-in and account management is performed by Windows Live ID, so you don't have to worry about implementing these details.

Included with the Web Authentication software development kit (SDK) are QuickStart sample applications in the ASP.NET, Java, Perl, PHP, Python, and Ruby programming languages. You can get the sample applications for this SDK from the Web Authentication download page>on Microsoft.com.

As one of the folks who's been championing opening up our authentication platform to Web developers, this is good news. I'm not particularly sold on using Windows Live ID as a single sign-on instead of sites managing their own identities but I do think that now that we allow non-Microsoft applications (e.g. mashups, widgets, etc) to act on behalf of Windows Live users via this SDK, there'll be a burst of new APIs coming out of Windows Live that will allow developers build applications that manipulate a user's data stored within Windows Live services.

Opening up our platform will definitely be good for users and will be good for the Web as well. Kudos, to the Windows Live ID folks for getting this out.

Now playing: Nappy Roots - Po' Folks


 

Categories: Web Development | Windows Live

How social networks handle multiple social contexts (e.g. my work life versus my personal life) has been on my mind this week. Today I was in a meeting where someone mentioned that most of the people he knows have profiles on both MySpace and Facebook because their real friends are on MySpace while their work friends are on Facebook. This reminded me that my wall currently has a mix of posts from Robert Scoble about random geek crap and posts by friends from Nigeria who I haven’t talked to in years catching up with me.

For some reason I find this interleaving of my personal relationships and my public work-related persona somewhat unsettling. Then there’s this post by danah boyd, loss of context for me on Facebook which contains the following excerpt

Anyhow, I know folks are still going wheeeeee about Facebook. And I know people generally believe that growth is nothing but candy-coated goodness. And while I hate using myself as an example (cuz I ain't representative), I do feel the need to point out that context management is still unfun, especially for early adopters, just as it has been on every other social network site. It sucks for teens trying to balance mom and friends. It sucks for college students trying to have a social life and not piss off their profs. It sucks for 20-somethings trying to date and balance their boss's presence. And it sucks for me.

I can't help but wonder if Facebook will have the same passionate college user base next school year now that it's the hip adult thing. I don't honestly know. But so far, American social network sites haven't supported multiple social contexts tremendously well. Maybe the limited profile and privacy settings help, but I'm not so sure. Especially when profs are there to hang out with their friends, not just spy on their students. I'm wondering how prepared students are to see their profs' Walls filled with notes from their friends. Hmmm...

as usual danah hits the nail on the head. There are a number of ways I can imagine social network sites doing a better job at supporting multiple social contexts but they all involve requiring some work from the user to set up their social contexts especially if they plan to become a permanent fixture in their user’s lives. However most social network sites seem more interested in being the equivalent of popular nightclubs (e.g. MySpace) than in becoming a social utility in the same way that email and instant messaging have become. Facebook  is the first widely popular social networking site I suspect will buck this trend. If there is one place there is still major room for improvement in their user experience [besides the inability to opt out of all the annoying application requests] it’s here. This is the one area where the site is weak, and if my experience and danah’s observations are anything to go by, eventually the site will be less of a social software utility and more of a place to hang out and we know what eventually happens to sites like that.  

Now playing: Gym Class Heroes - New Friend Request


 

Categories: Social Software

Matt Cutts has a blog post entitled Closing the loop on malware where he writes

Suppose you worked at a search engine and someone dropped a high-accuracy way to detect malware on the web in your lap (see this USENIX paper [PDF] for some of the details)? Is it better to start protecting users immediately, or to wait until your solution is perfectly polished for both users and site owners? Remember that the longer you delay, the more users potentially visit malware-laden web pages and get infected themselves.

Google chose to protect users first and then quickly iterate to improve things for site owners. I think that’s the right choice, but it’s still a tough question. Google started flagging sites where we detected malware in August of last year.

When I got home yesterday, my fiancée informed me that her laptop was infected with spyware. I asked how it happened and she mentioned that she’d been searching for sites to pimp her MySpace profile. Since we’d talked in the past about visiting suspicious websites I wondered why she chosen to ignore my advise. Her response? “Google didn’t put the This Site May Harm Your Computer warning on the link so I thought the site was safe. Google failed me.”

I find this interesting on several levels. There’s the fact that this feature is really useful and engenders a sense of trust in Google’s users. Then there’s the palpable sense of betrayal on the user’s part when Google’s “not yet perfectly polished” algorithms for detectings malicious software fails to indicate a bad site. Finally, there’s the observation that instead of blaming Microsoft who produces the operating system and theWeb  browser which were both infected by the spyware, she chose to blame Google who produced the search engine that led to the malicious site instead. Why do you think this is? I have my theories…

Now playing: Hurricane Chris - Ay Bay Bay


 

Categories: Technology

August 14, 2007
@ 03:19 AM

Recently I've seen a bunch of people I consider to be really smart sing the praises of Hadoop such as Sam Ruby in his post Long Bets, Tim O’Reilly in his post Yahoo!’s Bet on Hadoop, and Bill de hÓra in his post Phat Data. I haven’t dug too deeply into Hadoop due to the fact that the legal folks at work will chew out my butt if I did, there a number of little niggling doubts that make me wonder if this is the savior of the world that all these geeks claim it will be. Here are some random thoughts that have made me skeptical

  1. Code Quality: Hadoop was started by Doug Cutting who created Lucene and Nutch. I don’t know much about Nutch but I am quite familiar with Lucene because we adopted it for use in RSS Bandit. This is probably the worst decision we’ve made in the entire history of RSS Bandit. Not only are the APIs a usability nightmare because they were poorly hacked out then never refactored, the code is also notoriously flaky when it comes to dealing with concurrency so common advice is to never use multiple threads to do anything with Lucene.

  2. Incomplete Specifications: Hadoop’s MapReduce and HDFS are a re-implementation of Google’s MapReduce and Google File System (GFS)  technologies. However it seems unwise to base a project on research papers that may not reveal all the details needed to implement the service for competitive reasons. For example, the Hadoop documentation is silent on how it plans to deal with the election of a primary/master server among peers especially in the face of machine failure which Google solves using the Chubby lock service. It just so happens that there is a research paper that describes Chubby but how many other services within Google’s data centers do MapReduce and Google File System (GFS)  depend on which are yet to have their own public research paper? Speaking of which, where are the Google research papers on their message queueing infrastructure? You know they have to have one, right? How about their caching layer? Where are the papers on Google’s version of memcached?Secondly, what is the likelihood that Google will be as forthcoming with these papers now that they know competitors like Yahoo! are knocking off their internal architecture?

  3. A Search Optimized Architecture isn’t for Everyone: One of the features of MapReduce is that one can move the computation close to the data because “Moving Computation is Cheaper than Moving Data”. This is especially important when you are doing lots of processing intensive operations such as the kind of data analysis that goes into creating the Google search index. However what if you’re a site whose main tasks are reading and writing lots of data (e.g. MySpace) or sending lots of transient messages back and forth yet ensuring that they always arrive in the right order (e.g. Google Talk) then these optimizations and capabilities aren’t much use to you and a different set of tools would serve you better. 

I believe there are a lot of lessons that can be learned from how the distributed systems that power the services behind Google, Amazon and the like. However I think it is waaaay to early to be crowning some knock off of one particular vendors internal infrastructure as the future of distributed computing as we know it.

Seriously.

PS: Yes, I realize that Sam and Bill are primarily pointing out the increasing importance of parellel programming as it relates to the dual trends of (i) almost major website that ends up dealing with lots of data and has lots of traffic eventually eschews relational database features like joins, normalization, triggers and transactions because they are not cost effective and (ii) the increased large amounts of data that the we generate and now have to process due to falling storage costs. Even though their mentions of Hadoop are incidental it still seems to me that it’s almost become a meme, one which deserves more scrutiny before we jump on that particular band wagon. 

Now playing: N.W.A. - Appetite For Destruction


 

Categories: Platforms

It seems like I was just blogging about Windows Live Hotmail coming out of beta and it looks like there is already a substantial update to the service being rolled out. From the Windows Live Hotmail team’s blog post entitled August: Hotmail will soon bring you more of your requests, better performance we learn

We went out of beta in May, and we’re already releasing something new. Today, these new features will begin to roll our gradually to all our customers over the next few weeks, so if you don’t immediately see them, be patient, they’re coming!

More storage! Just when you were wondering how you’d ever fill up 2 or 4 GB of mail, we’ve given you more storage. Free users will get 5 GB and paid users will get 10 GB of Hotmail storage.

Contacts de-duplication: Do you have five different entries for the same person in your Contacts? Yeah, me too, but not anymore. We’re the first webmail service to roll out “contacts de-duplication”. If you get a message from “Steve Kafka” and click “add contact” but there’s already a Steve Kafka, we’ll let you know and let you add Steve’s other e-mail address to your existing “Steve Kafka” contact entry. We’re just trying to be smarter to make your life easier and faster. There’s also a wizard you can run to clean up your existing duplicate contacts.

Accepting meeting requests: If you receive a meeting request, such as one sent from Outlook, you can now click “accept” and have it added to your Calendar. This had existed for years in MSN Hotmail, and we’re adding it to Windows Live Hotmail now.

You can turn off the Today page (if you want to). If you’d rather see your inbox immediately upon login, you have the option to turn off the page of MSN news (called the Today page). The choice is yours. 

A nice combination of new features and pet peeves fixed with this release. The contacts duplication issue is particularly annoying and one I’ve wanted to see fixed for quite a while.

So far we’ve seen updates Spaces, SkyDrive, and now Mail within the past month. The summer of Windows Live is on here and so far it’s looking pretty good. I wonder what else Windows Live has up it’s sleeve?

Now playing: P. Diddy - That's Crazy (remix) (feat. Black Rob, Missy Elliott, Snoop Dogg & G-Dep)


 

Categories: Windows Live

There was an article on Ars Technica this weekend entitled Google selleth then taketh away, proving the need for DRM circumvention which is yet another example of how users can be screwed when they bet on a platform that utilizes DRM. The article states

It's not often that Google kills off one of its services, especially one which was announced with much fanfare at a big mainstream event like CES 2006. Yet Google Video's commercial aspirations have indeed been terminated: the company has announced that it will no longer be selling video content on the site. The news isn't all that surprising, given that Google's commercial video efforts were launched in rather poor shape and never managed to take off. The service seemed to only make the news when embarrassing things happened.

Yet now Google Video has given us a gift—a "proof of concept" in the form of yet another argument against DRM—and an argument for more reasonable laws governing copyright controls.

Google contacted customers late last week to tell them that the video store was closing. The e-mail declared, "In an effort to improve all Google services, we will no longer offer the ability to buy or rent videos for download from Google Video, ending the DTO/DTR (download-to-own/rent) program. This change will be effective August 15, 2007."

The message also announced that Google Checkout would issue credits in an amount equal to what those customers had spent at the Google Video store. Why the quasi-refunds? The kicker: "After August 15, 2007, you will no longer be able to view your purchased or rented videos."

See, after Google takes its video store down, its Internet-based DRM system will no longer function. This means that customers who have built video collections with Google Video offerings will find that their purchases no longer work. This is one of the major flaws in any DRM system based on secrets and centralized authorities: when these DRM data warehouses shut down, the DRM stops working, and consumers are left with useless junk.

Furthermore, Google is not refunding the total cost of the videos. To take advantage of the credit Google is offering, you have to spend more money, and furthermore, you have to spend it with a merchant that supports Google Checkout. Meanwhile, the purchases you made are now worthless.

This isn't the first time nor will it be the last time that some big company gives up on a product strategy tied to DRM, thus destroying thousands of dollars in end user investments. I wonder how many more fiascos it will take before consumers wholeheartedly reject DRM* or government regulators are forced to step in.

 Now playing: Panjabi MC - Beware (feat. Jay-Z)


 

Categories: Technology

Disclaimer: This blog post does not reflect future product announcements, technical strategy or advice from my employer. Disregard this disclaimer at your own risk.

In my previous post Some Thoughts on Open Social Networks, I gave my perspective on various definitions of "open social network" in response to the Wired article Slap in the Facebook: It's Time for Social Networks to Open Up. However there was one aspect of the article that I overlooked when I first read it. The first page of the article ends with the following exhortation.

We would like to place an open call to the web-programming community to solve this problem. We need a new framework based on open standards. Think of it as a structure that links individual sites and makes explicit social relationships, a way of defining micro social networks within the larger network of the web.

This is a problem that interests me personally. I have a Facebook profile while my fiancée has a MySpace profile. Since I’m now an active user of Facebook, I’d like her to be able to be part of my activities on the site such as being able to view my photos, read my wall posts and leave wall posts of her own. I could ask her to create a Facebook account, but I already asked her to create a profile on Windows Live Spaces so we could be friends on that service and quite frankly I don’t think she’ll find it reasonable if I keep asking her to jump from social network to social network because I happen to try out a lot of these services as part of my day job. So how can this problem be solved in the general case?

OpenID to the Rescue

This is exactly the kind of problem that OpenID was designed to solve.  The first thing to do is to make sure we all have the same general understanding of how OpenID works. It's basically the same model as Microsoft Passport Windows Live ID, Google Account Authentication for Web-Based Applications and Yahoo! Browser Based Authentication. A website redirects you to your identity provider, you authenticate yourself (i.e. login) on your identity providers site and then are redirected back to the referring site along with your authentication ticket. The ticket contains some information about you that can be used to uniquely identify you as well as some user data that may be of interest to the referring site (e.g. username).

So how does this help us? Let’s say MySpace was an OpenID provider which is a fancy way of saying that I can use my MySpace account to login to any site that accepts OpenIDs . And now let’s say Facebook was a site that accepted OpenIDs  as an identification scheme. This means that I could add my fiancée to the access control list of people who could view and interact with my profile on Facebook by using the URL to her MySpace profile as my identifier for her.  So when she tries to access my profile for the first time, she is directed to the Facebook login page where she has the option of logging in with her MySpace credentials. When she chooses this option she is directed to the MySpace login page. After logging into MySpace with proper credentials, she is redirected back to Facebook  and gets a pseudo-account on the service which allows her to participate in the site without having to go through an account creation process.

Now that the user has a pseudo-account on Facebook, wouldn’t it be nice if when someone clicked on them they got to see a Facebook profile? This is where OpenID Attribute Exchange can be put to use. You could define a set of required and optional attributes that are exchanged as part of social network interop using OpenID. So we can insert an extra step [which is may be hidden from the user] after the user is redirected to Facebook after logging into MySpace where the user’s profile information is requested. Here is an example of the kind of request that could be made by Facebook after a successful log-in attempt by a MySpace user.

openid.ns.ax=http://openid.net/srv/ax/1.0
openid.ax.type.fullname=http://example.com/openid/sn_schema/fullname
openid.ax.type.gender=http://example.com/openid/sn_schema/gender
openid.ax.type.relationship_status=http://example.com/openid/sn_schema/relationship_status
openid.ax.type.location=http://example.com/openid/sn_schema/location
openid.ax.type.looking_for=http://example.com/openid/sn_schema/looking_for
openid.ax.type.fav_music=http://example.com/openid/sn_schema/fav_music
openid.ax.count.fav_music=3
openid.ax.required=fullname,gender,location
openid.ax.if_available=relationship_status,looking_for,fav_music

which could return the following results

openid.ns.ax=http://openid.net/srv/ax/1.0
openid.ax.type.fullname=http://example.com/openid/sn_schema/fullname
openid.ax.type.gender=http://example.com/openid/sn_schema/gender
openid.ax.type.relationship_status=http://example.com/openid/sn_schema/relationship_status
openid.ax.type.location=http://example.com/openid/sn_schema/location
openid.ax.type.looking_for=http://example.com/openid/sn_schema/looking_for
openid.ax.type.fav_music=http://example.com/openid/sn_schema/fav_music
openid.ax.value.fullname=Jenna
openid.ax.value.gender=F
openid.ax.value.relationship_status=Single
openid.ax.value.location=Seattle, WA, United States
openid.ax.value.looking_for=Friends
openid.ax.value.fav_music=hiphop,country,pop
openid.ax.update_url=http://www.myspace.com/url_to_send_changes_made_to_profile

With the information returned by MySpace, one can now populate a place holder Facebook profile for the user.

Why This Will Never Happen

The question at the tip of your tongue is probably “If we can do this with OpenID today, how come I haven’t heard of anyone doing this yet?”.  As usual when it comes to interoperability, the primary reasons for lack of interoperability are business related and not technical.  When you look at the long list of Open ID providers, you may be notice that there is no similar long list of sites that accept OpenID  credentials. In fact, there is no such list of sites readily available because the number of them is an embarassing fraction of the number of sites that act as Open ID providers. Why this discrepancy?

If you look around, you’ll notice that the major online services such as Yahoo! via BBAuth, Microsoft via Passport Windows Live ID, and AOL via OpenID all provide ways for third party sites to accept user credentials from their sites. This increases the value of having an account on these services because it means now that I have a Microsoft Passport Windows Live ID I not only can log-in to various Microsoft properties across MSN and Windows Live but also non-Microsoft sites like Expedia. This increases the likelihood that I’ll get an account with the service which makes it more likely that I’ll be a regular user of the service which means $$$. On the other hand, accepting OpenIDs does the exact opposite. It actually reduces the incentive to create an account on the site which reduces the likelihood I’ll be a regular user of the site and less $$$. Why do you think there is no OpenID link on the AOL sign-in page even though the company is quick to brag about creating 63 million OpenIDs?

Why would Facebook implement a feature that reduced their user growth via network effects? Why would MySpace make it easy for sites to extract user profile information from their service? Because openness is great? Yeah…right.

Openness isn’t why Facebook is currently being valued at $6 billion nor is it why MySpace is currently expected to pull in about half a billion in revenue this year. These companies are doing just great being walled gardens and thanks to network effects, they will probably continue to do so unless something really disruptive happens.   

PS: Marc Canter asks if I can attend the Data Sharing Summit between Sept. 7th – 8th. I’m not sure I can since my wedding + honeymoon is next month. Consider this my contribution to the conversation if I don’t make it.

Now playing: Wu-Tang Clan - Can It Be All So Simple


 

Today some guy in the hallway mistook me for the other black guy that works in our building. Like we all look alike. Or there can only be one black guy that works in a building at Microsoft. Must be a quota. :)

Then I find this video in my RSS feeds and surprisingly I find my name mentioned in the comment threads.

Too bad it wasn't funny.


 

The speculation on LiveSide was right. Windows Live Folders is now Windows Live SkyDrive. You can catch the announcement on the product team's blog post Introducing Windows Live SkyDrive! which states

It’s been a month and a half since our first release, and today we’re making three major announcements!

First, we’re happy to announce our new name:



Second, we’ve been listening intently to your feedback and suggestions, and based directly on that feedback, we’re excited to bring you our next release, featuring:

  • An upgraded look and feel — new graphics to go along with your new features!
  • "Also on SkyDrive" — easily get back to the SkyDrives you’ve recently visited
  • Thumbnail images — we heard you loud and clear, and now you can see thumbnails of your image files
  • Drag and drop your files — sick of our five-at-a-time upload limit? Drag and drop your files right onto your SkyDrive
  • Embed your stuff anywhere — with just a few clicks, post your files and folders anywhere you can post html

Third, we’re excited to introduce SkyDrive in two additional regions: UK and India.

It's great to see this getting out to the general public. It's been pretty sweet watching this come together over the past year. I worked on some of the storage and permissioning platform aspects of this last year and I was quite impressed by a lot of the former members of the Microsoft Max who are now working on this product.

We definitely have a winner here.  Check it out.

UPDATE: Someone asked for a video or screencast of the site in action. There's one on the Window's Vista team blog. It is embedded below


Demo: Windows Live SkyDrive

Now playing: 50 Cent - Outta Control (remix) (feat. Mobb Deep)


 

Categories: Windows Live

This weekend, I finally decided to step into the 21st century and began the process of migrating RSS Bandit to v2.0 of the .NET Framework. In addition, we've also moved our source code repository from CVS to Subversion and so far it's already been a marked improvement. Since the .NET Framework is currently on v3.0 and v3.5 is in beta 1, I'm fairly out of date when it comes to the pet peeves in my favorite programming tools. At least one of my pet peeves was fixed, in Visual Studio 2005 I finally have an IDE where "Find References to this Method" actually works. On the flip side, the introduction of generics has added a lot more frustrating moments than I expected. By now most .NET developers have seen the dreaded

Cannot convert from 'System.Collections.Generic.List<subtype of T> to 'System.Collections.Generic.List<T>'

For those of you who aren't familiar with C# 2.0, here are examples of code that works and code that doesn't work. The difference is often subtle enough to be quite irritating when you first encounter it.

WORKS! - Array[subtype of T]  implicitly casted to Array[T]

using System;
using Cybertron.Transformers;

public class TransformersTest{

  public
static void GetReadyForBattle(Transformer[] robots){
    foreach(Transformer bot in robots){
        if(!bot.InRobotMode)
            bot.Transform();
    }
  }

  public static void Main(string[] args){

    Autobot OptimusPrime = new Autobot();
    Autobot[] autobots = new Autobot[1];
    autobots[0] = OptimusPrime;

    Decepticon Megatron = new Decepticon();
    Decepticon[] decepticons = new Decepticon[1];
    decepticons[0] = Megatron;

    GetReadyForBattle(decepticons);
    GetReadyForBattle(autobots);

  }
}

DOESN'T WORK - List<subtype of T> implicitly casted to List<T>

using System;
using Cybertron.Transformers;

public class TransformersTest{

  public static void GetReadyForBattle(List<Transformer> robots){
    foreach(Transformer bot in robots){
    if(!bot.InRobotMode)
        bot.Transform();
    }
  }

  public static void Main(string[] args){

    Autobot OptimusPrime = new Autobot();
    List<Autobot> autobots = new List<Autobot>(1);
    autobots.Add(OptimusPrime);

    Decepticon Megatron = new Decepticon();
    List<Decepticon> decepticons = new List<Decepticon>(1);
    decepticons.Add(Megatron);

    GetReadyForBattle(decepticons);
    GetReadyForBattle(autobots);

  }
}

The reason this doesn't work has been explained ad nauseum by various members of the CLR and C# teams such as Rick Byer's post Generic type parameter variance in the CLR where he argues

More formally, in C# v2.0 if T is a subtype of U, then T[] is a subtype of U[], but G<T> is not a subtype of G<U> (where G is any generic type).  In type-theory terminology, we describe this behavior by saying that C# array types are “covariant” and generic types are “invariant”. 

 

There is actually a reason why you might consider generic type invariance to be a good thing.  Consider the following code:

 

List<string> ls = new List<string>();

      ls.Add("test");

      List<object> lo = ls;   // Can't do this in C#

      object o1 = lo[0];      // ok – converting string to object

      lo[0] = new object();   // ERROR – can’t convert object to string

 

If this were allowed, the last line would have to result in a run-time type-check (to preserve type safety), which could throw an exception (eg. InvalidCastException).  This wouldn’t be the end of the world, but it would be unfortunate.

Even if I buy that the there is no good way to prevent the error scenario in the above code snippet without making generic types invariant, it seems that there were a couple of ways out of the problem that were shut out by the C# language team. One approach that I was so sure would work was to create a subtype of System.Collections.Generics.List and define implict and explict cast operators for it. It didn't

WORKS! - MyList<T> implicitly casted to ArrayList via user-defined cast operator

using System;
using Cybertron.Transformers;


public class MyList<T>: List<T>{

  public static implicit operator MyList<T>(ArrayList target){
    MyList<T> newList = new MyList<T>();

    foreach(T item in target){
        newList.Add(item);
    }
    return newList;
  }
}

public class Test{

  public static void GetReadyForBattle(MyList<Transformer> robots){
    foreach(Transformer bot in robots){
        if(!bot.InRobotMode){
                bot.Transform();
            }
        }   
  }

  public static void Main(string[] args){

    Autobot OptimusPrime = new Autobot();
    ArrayList autobots = new ArrayList(1);
    autobots.Add(OptimusPrime);

    Decepticon Megatron = new Decepticon();
    ArrayList decepticons = new ArrayList(1);
    decepticons.Add(Megatron);

    GetReadyForBattle(decepticons);
    GetReadyForBattle(autobots);
  }
}

DOESN'T WORK - MyList<subtype of T> implicitly casted to MyList<T> via user-defined cast

using System;
using Cybertron.Transformers;


public class MyList<T>: List<T>{

  public static implicit operator MyList<T>(MyList<U> target) where U:T{
    MyList<T> newList = new MyList<T>();

    foreach(T item in target){
        newList.Add(item);
    }
    return newList;
  }

}

public class Test{

  public static void GetReadyForBattle(MyList<Transformer> robots){
    foreach(Transformer bot in robots){
        if(!bot.InRobotMode)
            bot.Transform();
    }
  }

  public static void Main(string[] args){

   
Autobot OptimusPrime = new Autobot();
    MyList<Autobot> autobots = new MyList<Autobot>(1);
    autobots.Add(OptimusPrime);

    Decepticon Megatron = new Decepticon();
    MyList<Decepticon> decepticons = new MyList<Decepticon>(1);
    decepticons[0] = Megatron;

    GetReadyForBattle(decepticons);
    GetReadyForBattle(autobots);

  }
}

I really wanted that last bit of code to work because it would have been quite a non-intrusive fix for the problem (ignoring the fact that I have to use my own subclasses of the .NET Framework's collection classes). At the end of the day I ended up creating a TypeConverter utility class which contains some of the dumbest code I've had to write to trick a compiler into doing the right thing, here's what it ended up looking like

WORKS - Create a TypeConverter class that encapsulates calls to List.ConvertAll

using System;
using Cybertron.Transformers;


public class TypeConverter{

  public static List<Transformer> ToTransformerList<T>(List<T> target) where T: Transformer{
    return target.ConvertAll(new Converter<T,Transformer>(MakeTransformer));
  }

  public static Transformer
MakeTransformer<T>(T target) where T:Transformer{
    return target;
/* greatest conversion code ever!!!! */
  }

}

public class Test{

public static void GetReadyForBattle(List<Transformer> robots){
    foreach(Transformer bot in robots){
        if(!bot.InRobotMode)
            bot.Transform();
        }
    }
 }

 public static void Main(string[] args){

    Autobot OptimusPrime = new Autobot();
    List<Autobot> autobots = new List<Autobot>(1);
    autobots.Add(OptimusPrime);

    Decepticon Megatron = new Decepticon();
    List<Decepticon> decepticons = new List<Decepticon>(1);
    decepticons.Add(Megatron);

    GetReadyForBattle(TypeConverter.ToTransformerList(decepticons));
    GetReadyForBattle(TypeConverter.ToTransformerList(autobots));

 }

}

This works but it's ugly as sin. Anybody got any better ideas?

UPDATE: Lot's of great suggestions in the comments. Since I don't want to go ahead and modify a huge chunk of methods across our code base, I suspect I'll continue with the TypeConverter model. However John Spurlock pointed out that it is much smarter to implement the TypeConverter using generics for both input and output parameters instead of way I hacked it together last night. So our code will look more like

using System;
using Cybertron.Transformers;


public class TypeConverter{

 	/// <summary>
/// Returns a delegate that can be used to cast a subtype back to its base type.
/// </summary>
/// <typeparam name="T">The derived type</typeparam>
/// <typeparam name="U">The base type</typeparam>
/// <returns>Delegate that can be used to cast a subtype back to its base type. </returns>
public static Converter<T, U> UpCast<T, U>() where T : U {
return delegate(T item) { return (U)item; };
}

}


public class Test{

public static void GetReadyForBattle(List<Transformer> robots){
    foreach(Transformer bot in robots){
        if(!bot.InRobotMode)
            bot.Transform();
        }
    }
 }

 public static void Main(string[] args){

    Autobot OptimusPrime = new Autobot();
    List<Autobot> autobots = new List<Autobot>(1);
    autobots.Add(OptimusPrime);

    Decepticon Megatron = new Decepticon();
    List<Decepticon> decepticons = new List<Decepticon>(1);
    decepticons.Add(Megatron);

    GetReadyForBattle(decepticons
.ConvertAll(TypeConverter.UpCast<Decepticon, Transformer>()));
    GetReadyForBattle(autobots.ConvertAll(TypeConverter.UpCast<Autobot, Transformer>()));

 }

}


 

Categories: Programming

Remember back in heyday of TiVo when the Wall Street Journal ran the story my TiVo thinks I'm gay? Well, it seems I'm facing a similar dilemma with the news feed on my Facebook home. It has decided that Robert Scoble is the most important person in my social network. Here's a screen shot of what my news feed looked like when I logged into Facebook yesterday.

There are over a hundred people in my social network on Facebook, many tagged as coworkers, high school friends and family yet 50% of the content in my news feed is always about Robert Scoble. It would be understandable if he was the only one among my hundreds of "friends" actively using the site but that isn't the case. A quick glance at my status updates page reveals something quite astounding

Even though I've gotten status updates from about twenty people in my social network over the past day, the only person whose status updates Facebook decided are important enough to show on my home page when I log in is Robert Scoble. WTF?

Even crazier, guess who is on the list of people whose updates I've asked not to show up in my news feed unless nothing else is available?

I can only guess at why Facebook has decided to ignore my wishes and fill my news feed with content I've explicitly rejected. Perhaps their algorithms think he is my most important "friend" because he has thousands of people in his network? Perhaps they think his content will generate the most clickthroughs since they are usually videos? Either way, this is one instance where Facebook has failed to put the user in control.

If this hadn't become the primary way I keep up with a lot of folks I grew up with back in Nigeria, I'd quit using Facebook. Fricking...social lock-in.


 

Categories: Social Software

A few weeks ago, one of our execs at work asked me to think about "open" social networks. Since my day job is working on the social networking platform that underlies Windows Live Spaces and other Windows Live properties, it makes sense that if anyone at Microsoft is thinking about making our social networks "open" it should be me. However I quickly hit a snag. After some quick reading around, I realized that there isn't really a common definition of what it means for a social networking service to be "open". Instead, it seems we have a collection of pet peeves that various aggrieved parties like to blame on lack of openness. For example, read the Wired article Slap in the Facebook: It's Time for Social Networks to Open Up and compare it to this post on Read/Write Web entitled PeopleAggregator and Open Social Network Systems. Both articles are about "open" social networks yet they focus on completely different things. Below are my opinions on the various definitions of "open" in the context of social networking sites

  1. Content Hosted on the Site Not Viewable By the General Public and not Indexed by Search Engines:  As a user of Facebook, I consider this a feature not a bug. I've mentioned in previous blog postings that I don't think it is a great idea that all the stuff being published by teenagers and college students on the Web today will be held against them for the rest of their lives. Especially since using search engines to do quick background searches on potential hires and dates is now commonplace. Personally, I've had several negative experiences posting personal content to the public Web including

    1. fresh of out of college, I posted a blog post about almost hooking up with some girl at a nightclub and a heated email discussion I had with someone at work. It was extremely awkward to have both topics come up in conversations with fellow coworkers over the next few days because they'd read my blog.
    2. a few months ago I posted some pictures from a recent trip to Nigeria and this ignited a firestorm of over a hundred angry comments filled with abuse and threats to myself and my family because some Nigerians were upset that the president of Nigeria has servants domestic staff. I eventually made the pictures non-public on Flickr after conferring with my family members in Nigeria.
    3. around the same time I posted some pictures of my fiancée and I on my Windows Live Space and each picture now has a derogatory comment attached to it.

    At this point I've given up on posting personal pictures or diary like postings on the public Web. Facebook is now where I share pictures.

    When we first launched Windows Live Spaces, there was a lot of concern across the division when people realized that a significant portion of our user base was teenage girls who used the site to post personal details about themselves including pictures of themselves and friends. At the end we decided, like Facebook, that the default accessibility for content created by our teenage users (i.e. if they declare their age in their profile) would be for it to only be visible to people in their social network (i.e. Windows Live Messenger buddies and people in their Windows Live Spaces friends list). I think it is actually pretty slick that on Facebook, you can also create access control lists with entries like "anyone who's proved they work at Microsoft". 

  2. Inability to Export My Content from the Social Network: This is something that geeks complain about especially since they tend to join new social networking sites on a new basis but for the most part there isn't a lot of end user demand for this kind of functionality based on my experience working closely with the folks behind Windows Live Spaces and keeping an eye on feedback about other social networking sites. There are two main reasons for this, the first is that there is little value of having the content that is unique to the social network site outside of the service. For example, my friends list on Facebook is only useful in the context of that site. The only use for it outside the service would be for a way to bootstrap a new friends list by spamming all my friends on Facebook to tell them to join the new site.  Secondly, danah boyd has pointed out in her research that many young users of social networking sites consider their profiles to be ephemeral, to them not being able to just port your profile from MySpace to Facebook isn't a big deal because you're starting over anyway. For working professionals, things are a little different since they may have created content that has value outside the service (e.g. work-related blog postings related to their field of endeavor) so allowing data export in that context actually does serve a legitimate user need. 
  3. Full APIs for Extracting and Creating Content on the Social Network: With the growth in popularity and valuations of social networking sites, some companies have come to the conclusion that the there is an opportunity for making money by becoming meta-social network sites which aggregate a user's profiles and content from multiple social networking sites. There are literally dozens of Social Network Profile aggregators today and it is hard to imagine social networking sites viewing them as anything other than leeches trying to steal their page views by treating them as dumb storage systems. This is another reason why most social network services primarily focus on building widget platforms or APIs that enable you to create content or applications hosted within the site but don't give many ways to programmatically get content out.  

    Counter examples to this kind of thinking are Flickr and YouTube which both provide lots of ways to get content in and out of their service yet became two of the fastest growing and most admired websites in their respective categories. It is clear that a well-thought out API strategy that drives people to your site while not restricting your users combined with a great user experience on your website is a winning combination. Unfortunately, it's easier said than done.

  4. Being able to Interact with People from Different Social Networks from Your Preferred Social Network: I'm on Facebook and my fiancée is on MySpace. Wouldn't it be great if we could friend each other and send private messages without both being on the same service?

    It is likely that there is a lot of unvoiced demand for this functionality but it likely won't happen anytime soon for business reasons not technical ones. I suspect that the concept of "social network interop" will eventually mirror the current situation in the instant messaging world today.

    • We'll have two or three dominant social networking services with varying popularity in different global markets with a few local markets being dominated by local products.
    • There'll be little incentive for a dominant player to want to interoperate with smaller players. If interop happens it will be between players that are roughly the same size or have around the same market strength.
    • A small percentage of power users will use services that aggregate their profiles across social networks to get the benefits of social network interoperability. The dominant social networking sites will likely ignore these services unless they start getting too popular.
    • Corporate customers may be able to cut special deals so that their usage of public social networking services does interoperate with  whatever technology they use internally.

    Since I've assumed that some level of interoperability across social networking sites is inevitable, the question then is what is this functionality and what would the API/protocols look like? Good question.


 

Database normalization is a formal process of designing your database to eliminate redundant data, utilize space efficiently and reduce update errors. Anyone who has ever taken a database class has it drummed into their heads that a normalized database is the only way to go. This is true for the most part . However there are certain scenarios where the benefits of database normalization are outweighed by its costs. Two of these scenarios are described below.

Immutable Data and Append-Only Scenarios

Pat Helland, an enterprise architect at Microsoft who just rejoined the company after a two year stint at Amazon, has a blog post entitled Normalization is for Sissies where he presents his slides from an internal Microsoft gathering on database topics. In his presentation, Pat argues that database normalization is unnecessary in situations where we are storing immutable data such as financial transactions or a particular day's price list.

When Multiple Joins are Needed to Produce a Commonly Accessed View

The biggest problem with normalization is that you end up with multiple tables representing what is conceptually a single item. For example, consider this normalized set of tables which represent a user profile on a typical social networking site.

user table
user_id first_name last_name sex hometown relationship_status interested_in religious_views political_views
12345 John Doe Male Atlanta, GA married women (null) (null)
user_affiliations table
user_id (foreign_key) affiliation_id (foreign key)
12345 42
12345 598
affiliations table
affiliation_id description member_count
42 Microsoft 18,656
598 Georgia Tech 23,488
user_phone_numbers table
user_id (foreign_key) phone_number phone_type
12345 425-555-1203 Home
12345 425-555-6161 Work
12345 206-555-0932 Cell
user_screen_names table
user_id (foreign_key) screen_name im_service
12345 geeknproud@example.com AIM
12345 voip4life@example.org Skype
user_work_history table
user_id (foreign_key) company_affiliation_id (foreign key) company_name job_title
12345 42 Microsoft Program Manager
12345 78 i2 Technologies Quality Assurance Engineer

This is the kind of information you see on the average profile on Facebook. With the above design, it takes six SQL Join operations to access and display the information about a single user. This makes rendering the profile page a fairly database intensive operation which is compounded by the fact that profile pages are the most popular pages on social networking sites.

The simplest way to fix this problem is to denormalize the database. Instead of having tables for the user’s affiliations, phone numbers, IM addresses and so on, we can just place them in the user table as columns. The drawback with this approach is that there is now more wasted space (e.g. lots of college students people will have null for their work_phone)  and perhaps some redundant information (e.g. if we copy over the description of each affiliation into an affiliation_name column for each user to prevent having to do a join with the affiliations table). However given the very low costs of storage versus the improved performance characteristics of querying a single table and not having to deal with SQL statements that operate across six tables for every operation. This is a small price to pay.

As Joe Gregorio mentions in his blog post about the emergence of megadata, a lot of the large Web companies such as Google, eBay and Amazon are heavily into denormalizing their databases as well as eschewing transactions when updating these databases to improve their scalability.

Maybe normalization is for sissies…

UPDATE: Someone pointed out in the comments that denormalizing the affiliations table into user's table would mean the member_count would have to updated in thousands of user's rows when a new member was added to the group. This is obviously not the intent of denormalization for performance reasons since it replaces a bad problem with a worse one. Since an affiliation is a distinct concept from a user, it makes sense for it to have it's own table. Replicating the names of the groups a user is affiliated with in the user table is a good performance optimization although it does mean that the name has to be fixed up in thousands of tables if it ever changes. Since this is likely to happen very rarely, this is probably acceptable especially if we schedule renames to be done by a cron job during offpeak ours On the other hand, replicating the member count is just asking for trouble.

UPDATE 2: Lots of great comments here and on reddit indicate that I should have put more context around this post. Database denormalization is the kind of performance optimization that should be carried out as a last resort after trying things like creating database indexes, using SQL views and implementing application specific in-memory caching. However if you hit massive scale and are dealing with millions of queries a day across hundreds of millions to billions of records or have decided to go with database partitioning/sharding then you will likely end up resorting to denormalization. A real-world example of this is the Flickr database back-end whose details are described in Tim O'Reilly's Database War Stories #3: Flickr which contains the following quotes

tags are an interesting one. lots of the 'web 2.0' feature set doesn't fit well with traditional normalised db schema design. denormalization (or heavy caching) is the only way to generate a tag cloud in milliseconds for hundereds of millions of tags. you can cache stuff that's slow to generate, but if it's so expensive to generate that you can't ever regenerate that view without pegging a whole database server then it's not going to work (or you need dedicated servers to generate those views - some of our data views are calculated offline by dedicated processing clusters which save the results into mysql).

federating data also means denormalization is necessary - if we cut up data by user, where do we store data which relates to two users (such as a comment by one user on another user's photo). if we want to fetch it in the context of both user's, then we need to store it in both shards, or scan every shard for one of the views (which doesn't scale). we store alot of data twice, but then theres the issue of it going out of sync. we can avoid this to some extent with two-step transactions (open transaction 1, write commands, open transaction 2, write commands, commit 1st transaction if all is well, commit 2nd transaction if 1st commited) but there still a chance for failure when a box goes down during the 1st commit.

we need new tools to check data consistency across multiple shards, move data around shards and so on - a lot of the flickr code infrastructure deals with ensuring data is consistent and well balanced and finding and repairing it when it's not."

The part highlighted in red is also important to consider. Denormalization means that you you are now likely to deal with data inconsistencies because you are storing redundant copies of data and may not be able to update all copies of a column value simultaneously  when it is changed for a variety of reasons. Having tools in your infrastructure to support fixing up data of this sort then become very important.

Now playing: Bow Wow - Outta My System (feat. T-Pain)


 

Categories: Web Development

A number of people have been riffing on the how “Web 2.0” is the new vendor lock-in. The week started with a post by Alex Iskold entitled Towards the Attention Economy: Will Attention Silos Ever Open Up? where he wrote

At a quick glance there maybe nothing wrong with the way things are today. For example, you can login to Amazon and see your order history, you can see what you rented on Netflix or what you bought on eBay. The problem is that the information is not readily portable and not readily available via a common interface. Because of this, managing your attention information is practically impossible.

Consider a different industry - banking. Each bank makes your recent financial transactions exportable in a few formats - pdf, comma separated, Excel, etc. An export in Excel is actually an interesting example, because it illustrates how your information can be leveraged. By exporting information from your bank and credit card into Excel you are able to take it to your financial adviser who can in turn analyze it. The point is that your financial information is portable.

On the other hand your Netflix rental history is not. You can argue that it is possible to copy and paste it out of Netflix, but the cost of doing this is prohibitive for individuals.

Of course, not every “Web 2.0” company is like Netflix and some do provide APIs for getting out your data. But I think Mark Pilgrim has a great point in his post Let’s not and say we did where he writes

Praising companies for providing APIs to get your own data out is like praising auto companies for not filling your airbags with gravel. I’m not saying data export isn’t important, it’s just aiming kinda low. You mean when I give you data, you’ll… give it back to me? People who think this is the pinnacle of freedom aren’t really worth listening to. Please, we need a Free Data movement. (Yeah I know, Tim predicted it already. I was the one who told him, at FOO Camp the month before.)

Back in the day, I thought Steve Gillmor’s AttentionTrust was a step in the direction of a Free Data movement but since then all I’ve seen out of that crowd was either irrelevant (e.g. XML formats that replace OPML blogrolls) or ill-thought out (e.g. attempting to create "business opportunities" by forming companies which act as middle men who resold your data to the Amazon's and Netflix's of the world, kinda like Microsoft's Hailstorm vision). I keep wondering if we’ll ever see this Free Data movement. However there is another problem we have to face even if a Free Data movement does take hold.

In a follow up post to the piece by Alex Iskold entitled Attention mashups, Dave Winer gets to the heart of the matter in his characteristic blunt style when he writes

But whose data is it??  Permalink to this paragraph

Seems it belongs to the users and they should be able to take it where they want. Sure Yahoo is providing a recommendation engine, that's nice (and thanks), but they also get to use my data for their own purposes. Seems like a fair trade. And I'm a paying customer of Netflix. They just lowered the price but I'd much rather have gotten a dividend in the form of being able to use my own data.  Permalink to this paragraph

Think of the mashups that would be possible.  Permalink to this paragraph

Wouldn't it be great to link up Match.com with movie ratings to find dates that like the same movies?

One of the bitter truths about "Web 2.0" is that your data isn't all that interesting, our data on the other hand is very interesting. Dave Winer’s mash up example isn’t interesting because he wants to be able to get his data out of Netflix but because he wants to be able to combine his data with every body else’s data. This is where our “potential” Free Data movement will run into problems. The first is that a lot of “Web 2.0” websites provide value to their users via wisdom of the crowds appproaches such as tagging or  recommendations which are simply not possible with a single user’s data set or with a small set of users. This leads to a tendency for the rich to get richer because since they have the most data they provide the most value for end users (e.g. Amazon). Another problem is that social software leads to lock-in.  My buddy list on Windows Live Messenger and my list of friends in Facebook are useless to me outside the context of these applications. Although I can get all of my history and data out of these services, I lose the value I get from the fact that all my friends use these services as well. Again, my data isn’t what is interesting.

Being able to get your data out via APIs is a good first step but what is really interesting is being able to get everyone else’s data out of the service as well. Then we would have the beginings of truly open and free data on the Web which would lead to very, very interesting possibilities. 

Now playing: Rick Ross - Hustlin' (remix) (feat. Jay-Z & Young Jeezy)


 

Categories:

August 2, 2007
@ 02:40 AM

Yesterday, I was chatting with a former co-worker about Mary Jo Foley's article Could a startup beat Microsoft and Google to market with a ‘cloud OS’? and I pointed out that it was hard to make sense of the story because she seemed to be conflating multiple concepts then calling all of them a "cloud OS". It seems she isn’t the only one who throws around muddy definitions of this term as evidenced by C|Net articles like Cloud OS still pie in the sky and blog posts from Microsoft employees like Windows Cloud! What Would It Look Like!? 

I have no idea what Microsoft is actually working on in this space and even if I did I couldn't talk about it anyway. However I do think it is a good idea for people to have a clear idea of what they are talking about when the throw around terms like "cloud OS" or "cloud platform" so we don't end up with another useless term like SOA which means a different thing to each person who talks about it. Below are the three main ideas people often identify as a "Web OS", "cloud OS" or "cloud platform" and examples of companies executing on that vision.

WIMP Desktop Environment Implemented as a Rich Internet Application (The YouOS Strategy)

Porting the windows, icons, menu and pointer (WIMP) user interface which has defined desktop computing for the last three decades to the Web is seen by many logical extension of the desktop operating system. This is a throwback to the Oracle's network computer of the late 1990s where the expectation is that the average PC is not much more than a dumb terminal with enough horse power to handle the display requirements and computational needs of whatever rich internet application platform is needed to make this work.

A great example of a product in this space is YouOS. This seems to be the definition idea of a "cloud os" that is used by Ina Fried in the C|Net article Cloud OS still pie in the sky.

My thoughts on YouOS and applications like it were posted a year ago, my opinion hasn't changed since then.

Platform for Building Web-based Applications (The Amazon Strategy)

When you look at presentations on scaling popular websites like YouTube, Twitter, Flickr, eBay, etc it seems everyone keeps hitting the same problems and reinventing the same wheels. They all start of using LAMP thinking that’s the main platform decision they have to make. Then they eventually add on memcached or something similar to reduce disk I/O. After that, they may start to hit the limits of the capability of relational database management systems and may start taking data out of their databases, denormalizing them or simply repartition/reshard them as they add new machines or clusters. Then they realize that they now have dozens of machines in their data center when they started with one or two and managing them (i.e. patches, upgrades, hard disk crashes, dynamically adding new machines to the cluster, etc) becomes a problem.

Now what if someone who’d already built a massively scalable website and now had amassed a bunch of technologies and expertise at solving these problems decided to rent out access to their platform to startups and businesses who didn’t want to deal with a lot of the costs and complexities of building a popular Website beyond deciding whether to go with LAMP or WISC? That’s what Amazon has done with Amazon Web Services such as EC2 ,S3, SQS and the upcoming Dynamo.  

The same way a desktop operating system provides an abstraction over the complexity of interacting directly with the hardware is the way Amazon’s “cloud operating system” insulates Web developers from a lot of the concerns that currently plague Web development outside of actually writing the application code and dealing with support calls from their customers.

My thoughts on Amazon’s Web Services strategy remain the same. I think this is the future of Web platforms but there is still a long way to go for it to be attractive to today’s startup or business.

NOTE: Some people have commented that it is weird for an online retailer to get into this business. This belies a lack of knowledge of the company’s history. Amazon has always been about gaining expertise at some part of the Web retailer value chain then opening that up to others as a platform. Previous examples include the Amazon Honor System which treats their payment system as a platforn, Fulfillment by Amazon which treats their warehousing and product shipping system as a platform, zShops allows you to sell your products on their Website as well as more traditional co-branding deals where other sites reused their e-commerce platform such as Borders.com.

Web-based Applications and APIs for Integrating with Them (The Google Strategy)

Similar to Amazon, Google has created a rich set of tools and expertise at building and managing large scale websites. Unlike Amazon, Google has not indicated an interest in renting out these technologies and expertise to startups and businesses. Instead Google has focused on using their platform to give them a competitive advantage in the time to market, scalability and capabilities of their end user applications. Consider the following… 

If I use GMail for e-mail, Google Docs & Spreadsheets for my business documents, Google Calendar for my schedule, Google Talk for talking to my friends, Google search to find things on my desktop or on the Web and iGoogle as my start page when I get on the computer then it could be argued that for all intents and purposes my primary operating system was Google not Windows. Since every useful application eventually becomes a platform, Google’s Web-based applications are no exception. There is now a massive list of APIs for interacting and integrating with Google’s applications which make it easier to get data into Google’s services (e.g. the various GData APIs) or to spread the reach of Google’s services to sites they don’t control  (e.g. widgets like the Google AJAX Search API and the Google Maps API).

In his blog post GooOS, the Google Operating System Jason Kottke argues that the combination of Google’s various Web applications and APIs [especially if they include an office suite] plus some desktop and mobile entry points into their services is effectively a Google operating system. Considering Google’s recent Web office leanings and its bundling deals with Dell and Apple, it seems Jason Kottke was particularly prescient given that he wrote his blog post in 2004.

Now playing: Fabolous - Do The Damn Thing (feat. Young Jeezy)


 

It's been about a month and a half since we shipped the beta version of the next release of RSS Bandit codenamed ShadowCat. Since then we've been listening to our user feedback and have tracked down the major bugs that were causing significant instability in the application. After a lot of research and tons of negative feedback from our users we found out that the primary culprit for the significant increase in the number of crashes in the application in the most set of releases was a known issue in the Lucene search engine which powers our search feature. If we hear back from users who’ve complained that these crashes are no longer an issue, we’ll declare the release golden and start work on the Phoenix release.

You can grab the latest installer at RssBandit.1.5.0.15.ShadowCat.Beta.zip and let us know hear about your comments, complaints or kudos in the RSS Bandit forums

Major Bug Fixes Since the Previous ShadowCat beta

  • Random crashes due to error renaming file "deleteable.new" to "deletable" or "segments.new" to "segments" in search index folder.
  • Items in Atom 0.3 feeds that have a <created> date but no <issued> date show their date as the last modified date of the feed instead of the created date.
  • Images don't show up on certain items when clicking on feed or category view if the feed uses relative links such as in Tim Bray’s feed at http://www.tbray.org/ongoing/ongoing.atom.
  • Empty pages displayed in newspaper view when browsing multiple feeds under a category node.
  • Newly added feeds do not inherit the feed refresh rate specified in the Options dialog.
  • In certain cases, the following error message is displayed when attempting feed upload via FTP; "Feedlist upload failed with error: Passive mode not allowed on this server.."
  • Application crashes on startup with the COMException "unknown error"
  • None of the options when right-clicking on "This Feed" in feed properties is valid for newsgroups.
  • Crash because the application cannot modify the .treestate.xml configuration file
  • Crash when clicking on enclosure link in toast window

Now playing: Young Buck - I Know You Want Me (feat. Jazze Pha)


 

Categories: RSS Bandit

Via Mini-Microsoft I found the article Microsoft Investment Requires Too Much Patience - Barron's which contains the following excerpt

Some of the issues that worry analysts:

    1. It was clear from the presentation that many of the growth prospects will take 5-10 years to bear fruit.
    2. The company overspends ("nothing would delight analysts more than a nice big round of cost-cutting.")
    3. The businesses MSFT says it's entering (e.g. advertising and consumer electronics) are far more cut-throat than its current mix.
    4. Microsoft's focus on building internet infrastructure rather than building sites that bring in users is "backward."
    5. Bill Gates's plan to pass control of product development to Ray Ozzie "will not be a smooth one."

RE: Item #4, it seems weird for analysts to say Microsoft shouldn’t invest in building internet infrastructure but instead on building websites. What do they think you need to build the websites? It’s not like data centers are free.

Now playing: Kanye West - Stronger (feat. Daft Punk)


 

Categories: Life in the B0rg Cube

Via Anil Dash, I've found out that R. Kelly will be releasing a sequel to Trapped in the Closet, Chapters 1 - 12 this month. How ridiculously bad is "Trapped in the Closet"? Watch the video recap from Chapter 13 which is embedded below.

The original was such a favorite in our household that when I tried to introduce my fiancée's son to the greatness that was He-Man: Season One he rejected it and asked to watch "Trapped in the Closet" instead. I've already pre-ordered my copy from Amazon. 


 

Categories: Music