It seems like I was just blogging about Windows Live Hotmail coming out of beta and it looks like there is already a substantial update to the service being rolled out. From the Windows Live Hotmail team’s blog post entitled August: Hotmail will soon bring you more of your requests, better performance we learn

We went out of beta in May, and we’re already releasing something new. Today, these new features will begin to roll our gradually to all our customers over the next few weeks, so if you don’t immediately see them, be patient, they’re coming!

More storage! Just when you were wondering how you’d ever fill up 2 or 4 GB of mail, we’ve given you more storage. Free users will get 5 GB and paid users will get 10 GB of Hotmail storage.

Contacts de-duplication: Do you have five different entries for the same person in your Contacts? Yeah, me too, but not anymore. We’re the first webmail service to roll out “contacts de-duplication”. If you get a message from “Steve Kafka” and click “add contact” but there’s already a Steve Kafka, we’ll let you know and let you add Steve’s other e-mail address to your existing “Steve Kafka” contact entry. We’re just trying to be smarter to make your life easier and faster. There’s also a wizard you can run to clean up your existing duplicate contacts.

Accepting meeting requests: If you receive a meeting request, such as one sent from Outlook, you can now click “accept” and have it added to your Calendar. This had existed for years in MSN Hotmail, and we’re adding it to Windows Live Hotmail now.

You can turn off the Today page (if you want to). If you’d rather see your inbox immediately upon login, you have the option to turn off the page of MSN news (called the Today page). The choice is yours. 

A nice combination of new features and pet peeves fixed with this release. The contacts duplication issue is particularly annoying and one I’ve wanted to see fixed for quite a while.

So far we’ve seen updates Spaces, SkyDrive, and now Mail within the past month. The summer of Windows Live is on here and so far it’s looking pretty good. I wonder what else Windows Live has up it’s sleeve?

Now playing: P. Diddy - That's Crazy (remix) (feat. Black Rob, Missy Elliott, Snoop Dogg & G-Dep)


 

Categories: Windows Live

There was an article on Ars Technica this weekend entitled Google selleth then taketh away, proving the need for DRM circumvention which is yet another example of how users can be screwed when they bet on a platform that utilizes DRM. The article states

It's not often that Google kills off one of its services, especially one which was announced with much fanfare at a big mainstream event like CES 2006. Yet Google Video's commercial aspirations have indeed been terminated: the company has announced that it will no longer be selling video content on the site. The news isn't all that surprising, given that Google's commercial video efforts were launched in rather poor shape and never managed to take off. The service seemed to only make the news when embarrassing things happened.

Yet now Google Video has given us a gift—a "proof of concept" in the form of yet another argument against DRM—and an argument for more reasonable laws governing copyright controls.

Google contacted customers late last week to tell them that the video store was closing. The e-mail declared, "In an effort to improve all Google services, we will no longer offer the ability to buy or rent videos for download from Google Video, ending the DTO/DTR (download-to-own/rent) program. This change will be effective August 15, 2007."

The message also announced that Google Checkout would issue credits in an amount equal to what those customers had spent at the Google Video store. Why the quasi-refunds? The kicker: "After August 15, 2007, you will no longer be able to view your purchased or rented videos."

See, after Google takes its video store down, its Internet-based DRM system will no longer function. This means that customers who have built video collections with Google Video offerings will find that their purchases no longer work. This is one of the major flaws in any DRM system based on secrets and centralized authorities: when these DRM data warehouses shut down, the DRM stops working, and consumers are left with useless junk.

Furthermore, Google is not refunding the total cost of the videos. To take advantage of the credit Google is offering, you have to spend more money, and furthermore, you have to spend it with a merchant that supports Google Checkout. Meanwhile, the purchases you made are now worthless.

This isn't the first time nor will it be the last time that some big company gives up on a product strategy tied to DRM, thus destroying thousands of dollars in end user investments. I wonder how many more fiascos it will take before consumers wholeheartedly reject DRM* or government regulators are forced to step in.

 Now playing: Panjabi MC - Beware (feat. Jay-Z)


 

Categories: Technology

Disclaimer: This blog post does not reflect future product announcements, technical strategy or advice from my employer. Disregard this disclaimer at your own risk.

In my previous post Some Thoughts on Open Social Networks, I gave my perspective on various definitions of "open social network" in response to the Wired article Slap in the Facebook: It's Time for Social Networks to Open Up. However there was one aspect of the article that I overlooked when I first read it. The first page of the article ends with the following exhortation.

We would like to place an open call to the web-programming community to solve this problem. We need a new framework based on open standards. Think of it as a structure that links individual sites and makes explicit social relationships, a way of defining micro social networks within the larger network of the web.

This is a problem that interests me personally. I have a Facebook profile while my fiancée has a MySpace profile. Since I’m now an active user of Facebook, I’d like her to be able to be part of my activities on the site such as being able to view my photos, read my wall posts and leave wall posts of her own. I could ask her to create a Facebook account, but I already asked her to create a profile on Windows Live Spaces so we could be friends on that service and quite frankly I don’t think she’ll find it reasonable if I keep asking her to jump from social network to social network because I happen to try out a lot of these services as part of my day job. So how can this problem be solved in the general case?

OpenID to the Rescue

This is exactly the kind of problem that OpenID was designed to solve.  The first thing to do is to make sure we all have the same general understanding of how OpenID works. It's basically the same model as Microsoft Passport Windows Live ID, Google Account Authentication for Web-Based Applications and Yahoo! Browser Based Authentication. A website redirects you to your identity provider, you authenticate yourself (i.e. login) on your identity providers site and then are redirected back to the referring site along with your authentication ticket. The ticket contains some information about you that can be used to uniquely identify you as well as some user data that may be of interest to the referring site (e.g. username).

So how does this help us? Let’s say MySpace was an OpenID provider which is a fancy way of saying that I can use my MySpace account to login to any site that accepts OpenIDs . And now let’s say Facebook was a site that accepted OpenIDs  as an identification scheme. This means that I could add my fiancée to the access control list of people who could view and interact with my profile on Facebook by using the URL to her MySpace profile as my identifier for her.  So when she tries to access my profile for the first time, she is directed to the Facebook login page where she has the option of logging in with her MySpace credentials. When she chooses this option she is directed to the MySpace login page. After logging into MySpace with proper credentials, she is redirected back to Facebook  and gets a pseudo-account on the service which allows her to participate in the site without having to go through an account creation process.

Now that the user has a pseudo-account on Facebook, wouldn’t it be nice if when someone clicked on them they got to see a Facebook profile? This is where OpenID Attribute Exchange can be put to use. You could define a set of required and optional attributes that are exchanged as part of social network interop using OpenID. So we can insert an extra step [which is may be hidden from the user] after the user is redirected to Facebook after logging into MySpace where the user’s profile information is requested. Here is an example of the kind of request that could be made by Facebook after a successful log-in attempt by a MySpace user.

openid.ns.ax=http://openid.net/srv/ax/1.0
openid.ax.type.fullname=http://example.com/openid/sn_schema/fullname
openid.ax.type.gender=http://example.com/openid/sn_schema/gender
openid.ax.type.relationship_status=http://example.com/openid/sn_schema/relationship_status
openid.ax.type.location=http://example.com/openid/sn_schema/location
openid.ax.type.looking_for=http://example.com/openid/sn_schema/looking_for
openid.ax.type.fav_music=http://example.com/openid/sn_schema/fav_music
openid.ax.count.fav_music=3
openid.ax.required=fullname,gender,location
openid.ax.if_available=relationship_status,looking_for,fav_music

which could return the following results

openid.ns.ax=http://openid.net/srv/ax/1.0
openid.ax.type.fullname=http://example.com/openid/sn_schema/fullname
openid.ax.type.gender=http://example.com/openid/sn_schema/gender
openid.ax.type.relationship_status=http://example.com/openid/sn_schema/relationship_status
openid.ax.type.location=http://example.com/openid/sn_schema/location
openid.ax.type.looking_for=http://example.com/openid/sn_schema/looking_for
openid.ax.type.fav_music=http://example.com/openid/sn_schema/fav_music
openid.ax.value.fullname=Jenna
openid.ax.value.gender=F
openid.ax.value.relationship_status=Single
openid.ax.value.location=Seattle, WA, United States
openid.ax.value.looking_for=Friends
openid.ax.value.fav_music=hiphop,country,pop
openid.ax.update_url=http://www.myspace.com/url_to_send_changes_made_to_profile

With the information returned by MySpace, one can now populate a place holder Facebook profile for the user.

Why This Will Never Happen

The question at the tip of your tongue is probably “If we can do this with OpenID today, how come I haven’t heard of anyone doing this yet?”.  As usual when it comes to interoperability, the primary reasons for lack of interoperability are business related and not technical.  When you look at the long list of Open ID providers, you may be notice that there is no similar long list of sites that accept OpenID  credentials. In fact, there is no such list of sites readily available because the number of them is an embarassing fraction of the number of sites that act as Open ID providers. Why this discrepancy?

If you look around, you’ll notice that the major online services such as Yahoo! via BBAuth, Microsoft via Passport Windows Live ID, and AOL via OpenID all provide ways for third party sites to accept user credentials from their sites. This increases the value of having an account on these services because it means now that I have a Microsoft Passport Windows Live ID I not only can log-in to various Microsoft properties across MSN and Windows Live but also non-Microsoft sites like Expedia. This increases the likelihood that I’ll get an account with the service which makes it more likely that I’ll be a regular user of the service which means $$$. On the other hand, accepting OpenIDs does the exact opposite. It actually reduces the incentive to create an account on the site which reduces the likelihood I’ll be a regular user of the site and less $$$. Why do you think there is no OpenID link on the AOL sign-in page even though the company is quick to brag about creating 63 million OpenIDs?

Why would Facebook implement a feature that reduced their user growth via network effects? Why would MySpace make it easy for sites to extract user profile information from their service? Because openness is great? Yeah…right.

Openness isn’t why Facebook is currently being valued at $6 billion nor is it why MySpace is currently expected to pull in about half a billion in revenue this year. These companies are doing just great being walled gardens and thanks to network effects, they will probably continue to do so unless something really disruptive happens.   

PS: Marc Canter asks if I can attend the Data Sharing Summit between Sept. 7th – 8th. I’m not sure I can since my wedding + honeymoon is next month. Consider this my contribution to the conversation if I don’t make it.

Now playing: Wu-Tang Clan - Can It Be All So Simple


 

Today some guy in the hallway mistook me for the other black guy that works in our building. Like we all look alike. Or there can only be one black guy that works in a building at Microsoft. Must be a quota. :)

Then I find this video in my RSS feeds and surprisingly I find my name mentioned in the comment threads.

Too bad it wasn't funny.


 

The speculation on LiveSide was right. Windows Live Folders is now Windows Live SkyDrive. You can catch the announcement on the product team's blog post Introducing Windows Live SkyDrive! which states

It’s been a month and a half since our first release, and today we’re making three major announcements!

First, we’re happy to announce our new name:



Second, we’ve been listening intently to your feedback and suggestions, and based directly on that feedback, we’re excited to bring you our next release, featuring:

  • An upgraded look and feel — new graphics to go along with your new features!
  • "Also on SkyDrive" — easily get back to the SkyDrives you’ve recently visited
  • Thumbnail images — we heard you loud and clear, and now you can see thumbnails of your image files
  • Drag and drop your files — sick of our five-at-a-time upload limit? Drag and drop your files right onto your SkyDrive
  • Embed your stuff anywhere — with just a few clicks, post your files and folders anywhere you can post html

Third, we’re excited to introduce SkyDrive in two additional regions: UK and India.

It's great to see this getting out to the general public. It's been pretty sweet watching this come together over the past year. I worked on some of the storage and permissioning platform aspects of this last year and I was quite impressed by a lot of the former members of the Microsoft Max who are now working on this product.

We definitely have a winner here.  Check it out.

UPDATE: Someone asked for a video or screencast of the site in action. There's one on the Window's Vista team blog. It is embedded below


Demo: Windows Live SkyDrive

Now playing: 50 Cent - Outta Control (remix) (feat. Mobb Deep)


 

Categories: Windows Live

This weekend, I finally decided to step into the 21st century and began the process of migrating RSS Bandit to v2.0 of the .NET Framework. In addition, we've also moved our source code repository from CVS to Subversion and so far it's already been a marked improvement. Since the .NET Framework is currently on v3.0 and v3.5 is in beta 1, I'm fairly out of date when it comes to the pet peeves in my favorite programming tools. At least one of my pet peeves was fixed, in Visual Studio 2005 I finally have an IDE where "Find References to this Method" actually works. On the flip side, the introduction of generics has added a lot more frustrating moments than I expected. By now most .NET developers have seen the dreaded

Cannot convert from 'System.Collections.Generic.List<subtype of T> to 'System.Collections.Generic.List<T>'

For those of you who aren't familiar with C# 2.0, here are examples of code that works and code that doesn't work. The difference is often subtle enough to be quite irritating when you first encounter it.

WORKS! - Array[subtype of T]  implicitly casted to Array[T]

using System;
using Cybertron.Transformers;

public class TransformersTest{

  public
static void GetReadyForBattle(Transformer[] robots){
    foreach(Transformer bot in robots){
        if(!bot.InRobotMode)
            bot.Transform();
    }
  }

  public static void Main(string[] args){

    Autobot OptimusPrime = new Autobot();
    Autobot[] autobots = new Autobot[1];
    autobots[0] = OptimusPrime;

    Decepticon Megatron = new Decepticon();
    Decepticon[] decepticons = new Decepticon[1];
    decepticons[0] = Megatron;

    GetReadyForBattle(decepticons);
    GetReadyForBattle(autobots);

  }
}

DOESN'T WORK - List<subtype of T> implicitly casted to List<T>

using System;
using Cybertron.Transformers;

public class TransformersTest{

  public static void GetReadyForBattle(List<Transformer> robots){
    foreach(Transformer bot in robots){
    if(!bot.InRobotMode)
        bot.Transform();
    }
  }

  public static void Main(string[] args){

    Autobot OptimusPrime = new Autobot();
    List<Autobot> autobots = new List<Autobot>(1);
    autobots.Add(OptimusPrime);

    Decepticon Megatron = new Decepticon();
    List<Decepticon> decepticons = new List<Decepticon>(1);
    decepticons.Add(Megatron);

    GetReadyForBattle(decepticons);
    GetReadyForBattle(autobots);

  }
}

The reason this doesn't work has been explained ad nauseum by various members of the CLR and C# teams such as Rick Byer's post Generic type parameter variance in the CLR where he argues

More formally, in C# v2.0 if T is a subtype of U, then T[] is a subtype of U[], but G<T> is not a subtype of G<U> (where G is any generic type).  In type-theory terminology, we describe this behavior by saying that C# array types are “covariant” and generic types are “invariant”. 

 

There is actually a reason why you might consider generic type invariance to be a good thing.  Consider the following code:

 

List<string> ls = new List<string>();

      ls.Add("test");

      List<object> lo = ls;   // Can't do this in C#

      object o1 = lo[0];      // ok – converting string to object

      lo[0] = new object();   // ERROR – can’t convert object to string

 

If this were allowed, the last line would have to result in a run-time type-check (to preserve type safety), which could throw an exception (eg. InvalidCastException).  This wouldn’t be the end of the world, but it would be unfortunate.

Even if I buy that the there is no good way to prevent the error scenario in the above code snippet without making generic types invariant, it seems that there were a couple of ways out of the problem that were shut out by the C# language team. One approach that I was so sure would work was to create a subtype of System.Collections.Generics.List and define implict and explict cast operators for it. It didn't

WORKS! - MyList<T> implicitly casted to ArrayList via user-defined cast operator

using System;
using Cybertron.Transformers;


public class MyList<T>: List<T>{

  public static implicit operator MyList<T>(ArrayList target){
    MyList<T> newList = new MyList<T>();

    foreach(T item in target){
        newList.Add(item);
    }
    return newList;
  }
}

public class Test{

  public static void GetReadyForBattle(MyList<Transformer> robots){
    foreach(Transformer bot in robots){
        if(!bot.InRobotMode){
                bot.Transform();
            }
        }   
  }

  public static void Main(string[] args){

    Autobot OptimusPrime = new Autobot();
    ArrayList autobots = new ArrayList(1);
    autobots.Add(OptimusPrime);

    Decepticon Megatron = new Decepticon();
    ArrayList decepticons = new ArrayList(1);
    decepticons.Add(Megatron);

    GetReadyForBattle(decepticons);
    GetReadyForBattle(autobots);
  }
}

DOESN'T WORK - MyList<subtype of T> implicitly casted to MyList<T> via user-defined cast

using System;
using Cybertron.Transformers;


public class MyList<T>: List<T>{

  public static implicit operator MyList<T>(MyList<U> target) where U:T{
    MyList<T> newList = new MyList<T>();

    foreach(T item in target){
        newList.Add(item);
    }
    return newList;
  }

}

public class Test{

  public static void GetReadyForBattle(MyList<Transformer> robots){
    foreach(Transformer bot in robots){
        if(!bot.InRobotMode)
            bot.Transform();
    }
  }

  public static void Main(string[] args){

   
Autobot OptimusPrime = new Autobot();
    MyList<Autobot> autobots = new MyList<Autobot>(1);
    autobots.Add(OptimusPrime);

    Decepticon Megatron = new Decepticon();
    MyList<Decepticon> decepticons = new MyList<Decepticon>(1);
    decepticons[0] = Megatron;

    GetReadyForBattle(decepticons);
    GetReadyForBattle(autobots);

  }
}

I really wanted that last bit of code to work because it would have been quite a non-intrusive fix for the problem (ignoring the fact that I have to use my own subclasses of the .NET Framework's collection classes). At the end of the day I ended up creating a TypeConverter utility class which contains some of the dumbest code I've had to write to trick a compiler into doing the right thing, here's what it ended up looking like

WORKS - Create a TypeConverter class that encapsulates calls to List.ConvertAll

using System;
using Cybertron.Transformers;


public class TypeConverter{

  public static List<Transformer> ToTransformerList<T>(List<T> target) where T: Transformer{
    return target.ConvertAll(new Converter<T,Transformer>(MakeTransformer));
  }

  public static Transformer
MakeTransformer<T>(T target) where T:Transformer{
    return target;
/* greatest conversion code ever!!!! */
  }

}

public class Test{

public static void GetReadyForBattle(List<Transformer> robots){
    foreach(Transformer bot in robots){
        if(!bot.InRobotMode)
            bot.Transform();
        }
    }
 }

 public static void Main(string[] args){

    Autobot OptimusPrime = new Autobot();
    List<Autobot> autobots = new List<Autobot>(1);
    autobots.Add(OptimusPrime);

    Decepticon Megatron = new Decepticon();
    List<Decepticon> decepticons = new List<Decepticon>(1);
    decepticons.Add(Megatron);

    GetReadyForBattle(TypeConverter.ToTransformerList(decepticons));
    GetReadyForBattle(TypeConverter.ToTransformerList(autobots));

 }

}

This works but it's ugly as sin. Anybody got any better ideas?

UPDATE: Lot's of great suggestions in the comments. Since I don't want to go ahead and modify a huge chunk of methods across our code base, I suspect I'll continue with the TypeConverter model. However John Spurlock pointed out that it is much smarter to implement the TypeConverter using generics for both input and output parameters instead of way I hacked it together last night. So our code will look more like

using System;
using Cybertron.Transformers;


public class TypeConverter{

 	/// <summary>
/// Returns a delegate that can be used to cast a subtype back to its base type.
/// </summary>
/// <typeparam name="T">The derived type</typeparam>
/// <typeparam name="U">The base type</typeparam>
/// <returns>Delegate that can be used to cast a subtype back to its base type. </returns>
public static Converter<T, U> UpCast<T, U>() where T : U {
return delegate(T item) { return (U)item; };
}

}


public class Test{

public static void GetReadyForBattle(List<Transformer> robots){
    foreach(Transformer bot in robots){
        if(!bot.InRobotMode)
            bot.Transform();
        }
    }
 }

 public static void Main(string[] args){

    Autobot OptimusPrime = new Autobot();
    List<Autobot> autobots = new List<Autobot>(1);
    autobots.Add(OptimusPrime);

    Decepticon Megatron = new Decepticon();
    List<Decepticon> decepticons = new List<Decepticon>(1);
    decepticons.Add(Megatron);

    GetReadyForBattle(decepticons
.ConvertAll(TypeConverter.UpCast<Decepticon, Transformer>()));
    GetReadyForBattle(autobots.ConvertAll(TypeConverter.UpCast<Autobot, Transformer>()));

 }

}


 

Categories: Programming

Remember back in heyday of TiVo when the Wall Street Journal ran the story my TiVo thinks I'm gay? Well, it seems I'm facing a similar dilemma with the news feed on my Facebook home. It has decided that Robert Scoble is the most important person in my social network. Here's a screen shot of what my news feed looked like when I logged into Facebook yesterday.

There are over a hundred people in my social network on Facebook, many tagged as coworkers, high school friends and family yet 50% of the content in my news feed is always about Robert Scoble. It would be understandable if he was the only one among my hundreds of "friends" actively using the site but that isn't the case. A quick glance at my status updates page reveals something quite astounding

Even though I've gotten status updates from about twenty people in my social network over the past day, the only person whose status updates Facebook decided are important enough to show on my home page when I log in is Robert Scoble. WTF?

Even crazier, guess who is on the list of people whose updates I've asked not to show up in my news feed unless nothing else is available?

I can only guess at why Facebook has decided to ignore my wishes and fill my news feed with content I've explicitly rejected. Perhaps their algorithms think he is my most important "friend" because he has thousands of people in his network? Perhaps they think his content will generate the most clickthroughs since they are usually videos? Either way, this is one instance where Facebook has failed to put the user in control.

If this hadn't become the primary way I keep up with a lot of folks I grew up with back in Nigeria, I'd quit using Facebook. Fricking...social lock-in.


 

Categories: Social Software

A few weeks ago, one of our execs at work asked me to think about "open" social networks. Since my day job is working on the social networking platform that underlies Windows Live Spaces and other Windows Live properties, it makes sense that if anyone at Microsoft is thinking about making our social networks "open" it should be me. However I quickly hit a snag. After some quick reading around, I realized that there isn't really a common definition of what it means for a social networking service to be "open". Instead, it seems we have a collection of pet peeves that various aggrieved parties like to blame on lack of openness. For example, read the Wired article Slap in the Facebook: It's Time for Social Networks to Open Up and compare it to this post on Read/Write Web entitled PeopleAggregator and Open Social Network Systems. Both articles are about "open" social networks yet they focus on completely different things. Below are my opinions on the various definitions of "open" in the context of social networking sites

  1. Content Hosted on the Site Not Viewable By the General Public and not Indexed by Search Engines:  As a user of Facebook, I consider this a feature not a bug. I've mentioned in previous blog postings that I don't think it is a great idea that all the stuff being published by teenagers and college students on the Web today will be held against them for the rest of their lives. Especially since using search engines to do quick background searches on potential hires and dates is now commonplace. Personally, I've had several negative experiences posting personal content to the public Web including

    1. fresh of out of college, I posted a blog post about almost hooking up with some girl at a nightclub and a heated email discussion I had with someone at work. It was extremely awkward to have both topics come up in conversations with fellow coworkers over the next few days because they'd read my blog.
    2. a few months ago I posted some pictures from a recent trip to Nigeria and this ignited a firestorm of over a hundred angry comments filled with abuse and threats to myself and my family because some Nigerians were upset that the president of Nigeria has servants domestic staff. I eventually made the pictures non-public on Flickr after conferring with my family members in Nigeria.
    3. around the same time I posted some pictures of my fiancée and I on my Windows Live Space and each picture now has a derogatory comment attached to it.

    At this point I've given up on posting personal pictures or diary like postings on the public Web. Facebook is now where I share pictures.

    When we first launched Windows Live Spaces, there was a lot of concern across the division when people realized that a significant portion of our user base was teenage girls who used the site to post personal details about themselves including pictures of themselves and friends. At the end we decided, like Facebook, that the default accessibility for content created by our teenage users (i.e. if they declare their age in their profile) would be for it to only be visible to people in their social network (i.e. Windows Live Messenger buddies and people in their Windows Live Spaces friends list). I think it is actually pretty slick that on Facebook, you can also create access control lists with entries like "anyone who's proved they work at Microsoft". 

  2. Inability to Export My Content from the Social Network: This is something that geeks complain about especially since they tend to join new social networking sites on a new basis but for the most part there isn't a lot of end user demand for this kind of functionality based on my experience working closely with the folks behind Windows Live Spaces and keeping an eye on feedback about other social networking sites. There are two main reasons for this, the first is that there is little value of having the content that is unique to the social network site outside of the service. For example, my friends list on Facebook is only useful in the context of that site. The only use for it outside the service would be for a way to bootstrap a new friends list by spamming all my friends on Facebook to tell them to join the new site.  Secondly, danah boyd has pointed out in her research that many young users of social networking sites consider their profiles to be ephemeral, to them not being able to just port your profile from MySpace to Facebook isn't a big deal because you're starting over anyway. For working professionals, things are a little different since they may have created content that has value outside the service (e.g. work-related blog postings related to their field of endeavor) so allowing data export in that context actually does serve a legitimate user need. 
  3. Full APIs for Extracting and Creating Content on the Social Network: With the growth in popularity and valuations of social networking sites, some companies have come to the conclusion that the there is an opportunity for making money by becoming meta-social network sites which aggregate a user's profiles and content from multiple social networking sites. There are literally dozens of Social Network Profile aggregators today and it is hard to imagine social networking sites viewing them as anything other than leeches trying to steal their page views by treating them as dumb storage systems. This is another reason why most social network services primarily focus on building widget platforms or APIs that enable you to create content or applications hosted within the site but don't give many ways to programmatically get content out.  

    Counter examples to this kind of thinking are Flickr and YouTube which both provide lots of ways to get content in and out of their service yet became two of the fastest growing and most admired websites in their respective categories. It is clear that a well-thought out API strategy that drives people to your site while not restricting your users combined with a great user experience on your website is a winning combination. Unfortunately, it's easier said than done.

  4. Being able to Interact with People from Different Social Networks from Your Preferred Social Network: I'm on Facebook and my fiancée is on MySpace. Wouldn't it be great if we could friend each other and send private messages without both being on the same service?

    It is likely that there is a lot of unvoiced demand for this functionality but it likely won't happen anytime soon for business reasons not technical ones. I suspect that the concept of "social network interop" will eventually mirror the current situation in the instant messaging world today.

    • We'll have two or three dominant social networking services with varying popularity in different global markets with a few local markets being dominated by local products.
    • There'll be little incentive for a dominant player to want to interoperate with smaller players. If interop happens it will be between players that are roughly the same size or have around the same market strength.
    • A small percentage of power users will use services that aggregate their profiles across social networks to get the benefits of social network interoperability. The dominant social networking sites will likely ignore these services unless they start getting too popular.
    • Corporate customers may be able to cut special deals so that their usage of public social networking services does interoperate with  whatever technology they use internally.

    Since I've assumed that some level of interoperability across social networking sites is inevitable, the question then is what is this functionality and what would the API/protocols look like? Good question.


 

Database normalization is a formal process of designing your database to eliminate redundant data, utilize space efficiently and reduce update errors. Anyone who has ever taken a database class has it drummed into their heads that a normalized database is the only way to go. This is true for the most part . However there are certain scenarios where the benefits of database normalization are outweighed by its costs. Two of these scenarios are described below.

Immutable Data and Append-Only Scenarios

Pat Helland, an enterprise architect at Microsoft who just rejoined the company after a two year stint at Amazon, has a blog post entitled Normalization is for Sissies where he presents his slides from an internal Microsoft gathering on database topics. In his presentation, Pat argues that database normalization is unnecessary in situations where we are storing immutable data such as financial transactions or a particular day's price list.

When Multiple Joins are Needed to Produce a Commonly Accessed View

The biggest problem with normalization is that you end up with multiple tables representing what is conceptually a single item. For example, consider this normalized set of tables which represent a user profile on a typical social networking site.

user table
user_id first_name last_name sex hometown relationship_status interested_in religious_views political_views
12345 John Doe Male Atlanta, GA married women (null) (null)
user_affiliations table
user_id (foreign_key) affiliation_id (foreign key)
12345 42
12345 598
affiliations table
affiliation_id description member_count
42 Microsoft 18,656
598 Georgia Tech 23,488
user_phone_numbers table
user_id (foreign_key) phone_number phone_type
12345 425-555-1203 Home
12345 425-555-6161 Work
12345 206-555-0932 Cell
user_screen_names table
user_id (foreign_key) screen_name im_service
12345 geeknproud@example.com AIM
12345 voip4life@example.org Skype
user_work_history table
user_id (foreign_key) company_affiliation_id (foreign key) company_name job_title
12345 42 Microsoft Program Manager
12345 78 i2 Technologies Quality Assurance Engineer

This is the kind of information you see on the average profile on Facebook. With the above design, it takes six SQL Join operations to access and display the information about a single user. This makes rendering the profile page a fairly database intensive operation which is compounded by the fact that profile pages are the most popular pages on social networking sites.

The simplest way to fix this problem is to denormalize the database. Instead of having tables for the user’s affiliations, phone numbers, IM addresses and so on, we can just place them in the user table as columns. The drawback with this approach is that there is now more wasted space (e.g. lots of college students people will have null for their work_phone)  and perhaps some redundant information (e.g. if we copy over the description of each affiliation into an affiliation_name column for each user to prevent having to do a join with the affiliations table). However given the very low costs of storage versus the improved performance characteristics of querying a single table and not having to deal with SQL statements that operate across six tables for every operation. This is a small price to pay.

As Joe Gregorio mentions in his blog post about the emergence of megadata, a lot of the large Web companies such as Google, eBay and Amazon are heavily into denormalizing their databases as well as eschewing transactions when updating these databases to improve their scalability.

Maybe normalization is for sissies…

UPDATE: Someone pointed out in the comments that denormalizing the affiliations table into user's table would mean the member_count would have to updated in thousands of user's rows when a new member was added to the group. This is obviously not the intent of denormalization for performance reasons since it replaces a bad problem with a worse one. Since an affiliation is a distinct concept from a user, it makes sense for it to have it's own table. Replicating the names of the groups a user is affiliated with in the user table is a good performance optimization although it does mean that the name has to be fixed up in thousands of tables if it ever changes. Since this is likely to happen very rarely, this is probably acceptable especially if we schedule renames to be done by a cron job during offpeak ours On the other hand, replicating the member count is just asking for trouble.

UPDATE 2: Lots of great comments here and on reddit indicate that I should have put more context around this post. Database denormalization is the kind of performance optimization that should be carried out as a last resort after trying things like creating database indexes, using SQL views and implementing application specific in-memory caching. However if you hit massive scale and are dealing with millions of queries a day across hundreds of millions to billions of records or have decided to go with database partitioning/sharding then you will likely end up resorting to denormalization. A real-world example of this is the Flickr database back-end whose details are described in Tim O'Reilly's Database War Stories #3: Flickr which contains the following quotes

tags are an interesting one. lots of the 'web 2.0' feature set doesn't fit well with traditional normalised db schema design. denormalization (or heavy caching) is the only way to generate a tag cloud in milliseconds for hundereds of millions of tags. you can cache stuff that's slow to generate, but if it's so expensive to generate that you can't ever regenerate that view without pegging a whole database server then it's not going to work (or you need dedicated servers to generate those views - some of our data views are calculated offline by dedicated processing clusters which save the results into mysql).

federating data also means denormalization is necessary - if we cut up data by user, where do we store data which relates to two users (such as a comment by one user on another user's photo). if we want to fetch it in the context of both user's, then we need to store it in both shards, or scan every shard for one of the views (which doesn't scale). we store alot of data twice, but then theres the issue of it going out of sync. we can avoid this to some extent with two-step transactions (open transaction 1, write commands, open transaction 2, write commands, commit 1st transaction if all is well, commit 2nd transaction if 1st commited) but there still a chance for failure when a box goes down during the 1st commit.

we need new tools to check data consistency across multiple shards, move data around shards and so on - a lot of the flickr code infrastructure deals with ensuring data is consistent and well balanced and finding and repairing it when it's not."

The part highlighted in red is also important to consider. Denormalization means that you you are now likely to deal with data inconsistencies because you are storing redundant copies of data and may not be able to update all copies of a column value simultaneously  when it is changed for a variety of reasons. Having tools in your infrastructure to support fixing up data of this sort then become very important.

Now playing: Bow Wow - Outta My System (feat. T-Pain)


 

Categories: Web Development

A number of people have been riffing on the how “Web 2.0” is the new vendor lock-in. The week started with a post by Alex Iskold entitled Towards the Attention Economy: Will Attention Silos Ever Open Up? where he wrote

At a quick glance there maybe nothing wrong with the way things are today. For example, you can login to Amazon and see your order history, you can see what you rented on Netflix or what you bought on eBay. The problem is that the information is not readily portable and not readily available via a common interface. Because of this, managing your attention information is practically impossible.

Consider a different industry - banking. Each bank makes your recent financial transactions exportable in a few formats - pdf, comma separated, Excel, etc. An export in Excel is actually an interesting example, because it illustrates how your information can be leveraged. By exporting information from your bank and credit card into Excel you are able to take it to your financial adviser who can in turn analyze it. The point is that your financial information is portable.

On the other hand your Netflix rental history is not. You can argue that it is possible to copy and paste it out of Netflix, but the cost of doing this is prohibitive for individuals.

Of course, not every “Web 2.0” company is like Netflix and some do provide APIs for getting out your data. But I think Mark Pilgrim has a great point in his post Let’s not and say we did where he writes

Praising companies for providing APIs to get your own data out is like praising auto companies for not filling your airbags with gravel. I’m not saying data export isn’t important, it’s just aiming kinda low. You mean when I give you data, you’ll… give it back to me? People who think this is the pinnacle of freedom aren’t really worth listening to. Please, we need a Free Data movement. (Yeah I know, Tim predicted it already. I was the one who told him, at FOO Camp the month before.)

Back in the day, I thought Steve Gillmor’s AttentionTrust was a step in the direction of a Free Data movement but since then all I’ve seen out of that crowd was either irrelevant (e.g. XML formats that replace OPML blogrolls) or ill-thought out (e.g. attempting to create "business opportunities" by forming companies which act as middle men who resold your data to the Amazon's and Netflix's of the world, kinda like Microsoft's Hailstorm vision). I keep wondering if we’ll ever see this Free Data movement. However there is another problem we have to face even if a Free Data movement does take hold.

In a follow up post to the piece by Alex Iskold entitled Attention mashups, Dave Winer gets to the heart of the matter in his characteristic blunt style when he writes

But whose data is it??  Permalink to this paragraph

Seems it belongs to the users and they should be able to take it where they want. Sure Yahoo is providing a recommendation engine, that's nice (and thanks), but they also get to use my data for their own purposes. Seems like a fair trade. And I'm a paying customer of Netflix. They just lowered the price but I'd much rather have gotten a dividend in the form of being able to use my own data.  Permalink to this paragraph

Think of the mashups that would be possible.  Permalink to this paragraph

Wouldn't it be great to link up Match.com with movie ratings to find dates that like the same movies?

One of the bitter truths about "Web 2.0" is that your data isn't all that interesting, our data on the other hand is very interesting. Dave Winer’s mash up example isn’t interesting because he wants to be able to get his data out of Netflix but because he wants to be able to combine his data with every body else’s data. This is where our “potential” Free Data movement will run into problems. The first is that a lot of “Web 2.0” websites provide value to their users via wisdom of the crowds appproaches such as tagging or  recommendations which are simply not possible with a single user’s data set or with a small set of users. This leads to a tendency for the rich to get richer because since they have the most data they provide the most value for end users (e.g. Amazon). Another problem is that social software leads to lock-in.  My buddy list on Windows Live Messenger and my list of friends in Facebook are useless to me outside the context of these applications. Although I can get all of my history and data out of these services, I lose the value I get from the fact that all my friends use these services as well. Again, my data isn’t what is interesting.

Being able to get your data out via APIs is a good first step but what is really interesting is being able to get everyone else’s data out of the service as well. Then we would have the beginings of truly open and free data on the Web which would lead to very, very interesting possibilities. 

Now playing: Rick Ross - Hustlin' (remix) (feat. Jay-Z & Young Jeezy)


 

Categories: