Today some guy in the hallway mistook me for the other black guy that works in our building. Like we all look alike. Or there can only be one black guy that works in a building at Microsoft. Must be a quota. :)

Then I find this video in my RSS feeds and surprisingly I find my name mentioned in the comment threads.

Too bad it wasn't funny.


 

The speculation on LiveSide was right. Windows Live Folders is now Windows Live SkyDrive. You can catch the announcement on the product team's blog post Introducing Windows Live SkyDrive! which states

It’s been a month and a half since our first release, and today we’re making three major announcements!

First, we’re happy to announce our new name:



Second, we’ve been listening intently to your feedback and suggestions, and based directly on that feedback, we’re excited to bring you our next release, featuring:

  • An upgraded look and feel — new graphics to go along with your new features!
  • "Also on SkyDrive" — easily get back to the SkyDrives you’ve recently visited
  • Thumbnail images — we heard you loud and clear, and now you can see thumbnails of your image files
  • Drag and drop your files — sick of our five-at-a-time upload limit? Drag and drop your files right onto your SkyDrive
  • Embed your stuff anywhere — with just a few clicks, post your files and folders anywhere you can post html

Third, we’re excited to introduce SkyDrive in two additional regions: UK and India.

It's great to see this getting out to the general public. It's been pretty sweet watching this come together over the past year. I worked on some of the storage and permissioning platform aspects of this last year and I was quite impressed by a lot of the former members of the Microsoft Max who are now working on this product.

We definitely have a winner here.  Check it out.

UPDATE: Someone asked for a video or screencast of the site in action. There's one on the Window's Vista team blog. It is embedded below


Demo: Windows Live SkyDrive

Now playing: 50 Cent - Outta Control (remix) (feat. Mobb Deep)


 

Categories: Windows Live

This weekend, I finally decided to step into the 21st century and began the process of migrating RSS Bandit to v2.0 of the .NET Framework. In addition, we've also moved our source code repository from CVS to Subversion and so far it's already been a marked improvement. Since the .NET Framework is currently on v3.0 and v3.5 is in beta 1, I'm fairly out of date when it comes to the pet peeves in my favorite programming tools. At least one of my pet peeves was fixed, in Visual Studio 2005 I finally have an IDE where "Find References to this Method" actually works. On the flip side, the introduction of generics has added a lot more frustrating moments than I expected. By now most .NET developers have seen the dreaded

Cannot convert from 'System.Collections.Generic.List<subtype of T> to 'System.Collections.Generic.List<T>'

For those of you who aren't familiar with C# 2.0, here are examples of code that works and code that doesn't work. The difference is often subtle enough to be quite irritating when you first encounter it.

WORKS! - Array[subtype of T]  implicitly casted to Array[T]

using System;
using Cybertron.Transformers;

public class TransformersTest{

  public
static void GetReadyForBattle(Transformer[] robots){
    foreach(Transformer bot in robots){
        if(!bot.InRobotMode)
            bot.Transform();
    }
  }

  public static void Main(string[] args){

    Autobot OptimusPrime = new Autobot();
    Autobot[] autobots = new Autobot[1];
    autobots[0] = OptimusPrime;

    Decepticon Megatron = new Decepticon();
    Decepticon[] decepticons = new Decepticon[1];
    decepticons[0] = Megatron;

    GetReadyForBattle(decepticons);
    GetReadyForBattle(autobots);

  }
}

DOESN'T WORK - List<subtype of T> implicitly casted to List<T>

using System;
using Cybertron.Transformers;

public class TransformersTest{

  public static void GetReadyForBattle(List<Transformer> robots){
    foreach(Transformer bot in robots){
    if(!bot.InRobotMode)
        bot.Transform();
    }
  }

  public static void Main(string[] args){

    Autobot OptimusPrime = new Autobot();
    List<Autobot> autobots = new List<Autobot>(1);
    autobots.Add(OptimusPrime);

    Decepticon Megatron = new Decepticon();
    List<Decepticon> decepticons = new List<Decepticon>(1);
    decepticons.Add(Megatron);

    GetReadyForBattle(decepticons);
    GetReadyForBattle(autobots);

  }
}

The reason this doesn't work has been explained ad nauseum by various members of the CLR and C# teams such as Rick Byer's post Generic type parameter variance in the CLR where he argues

More formally, in C# v2.0 if T is a subtype of U, then T[] is a subtype of U[], but G<T> is not a subtype of G<U> (where G is any generic type).  In type-theory terminology, we describe this behavior by saying that C# array types are “covariant” and generic types are “invariant”. 

 

There is actually a reason why you might consider generic type invariance to be a good thing.  Consider the following code:

 

List<string> ls = new List<string>();

      ls.Add("test");

      List<object> lo = ls;   // Can't do this in C#

      object o1 = lo[0];      // ok – converting string to object

      lo[0] = new object();   // ERROR – can’t convert object to string

 

If this were allowed, the last line would have to result in a run-time type-check (to preserve type safety), which could throw an exception (eg. InvalidCastException).  This wouldn’t be the end of the world, but it would be unfortunate.

Even if I buy that the there is no good way to prevent the error scenario in the above code snippet without making generic types invariant, it seems that there were a couple of ways out of the problem that were shut out by the C# language team. One approach that I was so sure would work was to create a subtype of System.Collections.Generics.List and define implict and explict cast operators for it. It didn't

WORKS! - MyList<T> implicitly casted to ArrayList via user-defined cast operator

using System;
using Cybertron.Transformers;


public class MyList<T>: List<T>{

  public static implicit operator MyList<T>(ArrayList target){
    MyList<T> newList = new MyList<T>();

    foreach(T item in target){
        newList.Add(item);
    }
    return newList;
  }
}

public class Test{

  public static void GetReadyForBattle(MyList<Transformer> robots){
    foreach(Transformer bot in robots){
        if(!bot.InRobotMode){
                bot.Transform();
            }
        }   
  }

  public static void Main(string[] args){

    Autobot OptimusPrime = new Autobot();
    ArrayList autobots = new ArrayList(1);
    autobots.Add(OptimusPrime);

    Decepticon Megatron = new Decepticon();
    ArrayList decepticons = new ArrayList(1);
    decepticons.Add(Megatron);

    GetReadyForBattle(decepticons);
    GetReadyForBattle(autobots);
  }
}

DOESN'T WORK - MyList<subtype of T> implicitly casted to MyList<T> via user-defined cast

using System;
using Cybertron.Transformers;


public class MyList<T>: List<T>{

  public static implicit operator MyList<T>(MyList<U> target) where U:T{
    MyList<T> newList = new MyList<T>();

    foreach(T item in target){
        newList.Add(item);
    }
    return newList;
  }

}

public class Test{

  public static void GetReadyForBattle(MyList<Transformer> robots){
    foreach(Transformer bot in robots){
        if(!bot.InRobotMode)
            bot.Transform();
    }
  }

  public static void Main(string[] args){

   
Autobot OptimusPrime = new Autobot();
    MyList<Autobot> autobots = new MyList<Autobot>(1);
    autobots.Add(OptimusPrime);

    Decepticon Megatron = new Decepticon();
    MyList<Decepticon> decepticons = new MyList<Decepticon>(1);
    decepticons[0] = Megatron;

    GetReadyForBattle(decepticons);
    GetReadyForBattle(autobots);

  }
}

I really wanted that last bit of code to work because it would have been quite a non-intrusive fix for the problem (ignoring the fact that I have to use my own subclasses of the .NET Framework's collection classes). At the end of the day I ended up creating a TypeConverter utility class which contains some of the dumbest code I've had to write to trick a compiler into doing the right thing, here's what it ended up looking like

WORKS - Create a TypeConverter class that encapsulates calls to List.ConvertAll

using System;
using Cybertron.Transformers;


public class TypeConverter{

  public static List<Transformer> ToTransformerList<T>(List<T> target) where T: Transformer{
    return target.ConvertAll(new Converter<T,Transformer>(MakeTransformer));
  }

  public static Transformer
MakeTransformer<T>(T target) where T:Transformer{
    return target;
/* greatest conversion code ever!!!! */
  }

}

public class Test{

public static void GetReadyForBattle(List<Transformer> robots){
    foreach(Transformer bot in robots){
        if(!bot.InRobotMode)
            bot.Transform();
        }
    }
 }

 public static void Main(string[] args){

    Autobot OptimusPrime = new Autobot();
    List<Autobot> autobots = new List<Autobot>(1);
    autobots.Add(OptimusPrime);

    Decepticon Megatron = new Decepticon();
    List<Decepticon> decepticons = new List<Decepticon>(1);
    decepticons.Add(Megatron);

    GetReadyForBattle(TypeConverter.ToTransformerList(decepticons));
    GetReadyForBattle(TypeConverter.ToTransformerList(autobots));

 }

}

This works but it's ugly as sin. Anybody got any better ideas?

UPDATE: Lot's of great suggestions in the comments. Since I don't want to go ahead and modify a huge chunk of methods across our code base, I suspect I'll continue with the TypeConverter model. However John Spurlock pointed out that it is much smarter to implement the TypeConverter using generics for both input and output parameters instead of way I hacked it together last night. So our code will look more like

using System;
using Cybertron.Transformers;


public class TypeConverter{

 	/// <summary>
/// Returns a delegate that can be used to cast a subtype back to its base type.
/// </summary>
/// <typeparam name="T">The derived type</typeparam>
/// <typeparam name="U">The base type</typeparam>
/// <returns>Delegate that can be used to cast a subtype back to its base type. </returns>
public static Converter<T, U> UpCast<T, U>() where T : U {
return delegate(T item) { return (U)item; };
}

}


public class Test{

public static void GetReadyForBattle(List<Transformer> robots){
    foreach(Transformer bot in robots){
        if(!bot.InRobotMode)
            bot.Transform();
        }
    }
 }

 public static void Main(string[] args){

    Autobot OptimusPrime = new Autobot();
    List<Autobot> autobots = new List<Autobot>(1);
    autobots.Add(OptimusPrime);

    Decepticon Megatron = new Decepticon();
    List<Decepticon> decepticons = new List<Decepticon>(1);
    decepticons.Add(Megatron);

    GetReadyForBattle(decepticons
.ConvertAll(TypeConverter.UpCast<Decepticon, Transformer>()));
    GetReadyForBattle(autobots.ConvertAll(TypeConverter.UpCast<Autobot, Transformer>()));

 }

}


 

Categories: Programming

Remember back in heyday of TiVo when the Wall Street Journal ran the story my TiVo thinks I'm gay? Well, it seems I'm facing a similar dilemma with the news feed on my Facebook home. It has decided that Robert Scoble is the most important person in my social network. Here's a screen shot of what my news feed looked like when I logged into Facebook yesterday.

There are over a hundred people in my social network on Facebook, many tagged as coworkers, high school friends and family yet 50% of the content in my news feed is always about Robert Scoble. It would be understandable if he was the only one among my hundreds of "friends" actively using the site but that isn't the case. A quick glance at my status updates page reveals something quite astounding

Even though I've gotten status updates from about twenty people in my social network over the past day, the only person whose status updates Facebook decided are important enough to show on my home page when I log in is Robert Scoble. WTF?

Even crazier, guess who is on the list of people whose updates I've asked not to show up in my news feed unless nothing else is available?

I can only guess at why Facebook has decided to ignore my wishes and fill my news feed with content I've explicitly rejected. Perhaps their algorithms think he is my most important "friend" because he has thousands of people in his network? Perhaps they think his content will generate the most clickthroughs since they are usually videos? Either way, this is one instance where Facebook has failed to put the user in control.

If this hadn't become the primary way I keep up with a lot of folks I grew up with back in Nigeria, I'd quit using Facebook. Fricking...social lock-in.


 

Categories: Social Software

A few weeks ago, one of our execs at work asked me to think about "open" social networks. Since my day job is working on the social networking platform that underlies Windows Live Spaces and other Windows Live properties, it makes sense that if anyone at Microsoft is thinking about making our social networks "open" it should be me. However I quickly hit a snag. After some quick reading around, I realized that there isn't really a common definition of what it means for a social networking service to be "open". Instead, it seems we have a collection of pet peeves that various aggrieved parties like to blame on lack of openness. For example, read the Wired article Slap in the Facebook: It's Time for Social Networks to Open Up and compare it to this post on Read/Write Web entitled PeopleAggregator and Open Social Network Systems. Both articles are about "open" social networks yet they focus on completely different things. Below are my opinions on the various definitions of "open" in the context of social networking sites

  1. Content Hosted on the Site Not Viewable By the General Public and not Indexed by Search Engines:  As a user of Facebook, I consider this a feature not a bug. I've mentioned in previous blog postings that I don't think it is a great idea that all the stuff being published by teenagers and college students on the Web today will be held against them for the rest of their lives. Especially since using search engines to do quick background searches on potential hires and dates is now commonplace. Personally, I've had several negative experiences posting personal content to the public Web including

    1. fresh of out of college, I posted a blog post about almost hooking up with some girl at a nightclub and a heated email discussion I had with someone at work. It was extremely awkward to have both topics come up in conversations with fellow coworkers over the next few days because they'd read my blog.
    2. a few months ago I posted some pictures from a recent trip to Nigeria and this ignited a firestorm of over a hundred angry comments filled with abuse and threats to myself and my family because some Nigerians were upset that the president of Nigeria has servants domestic staff. I eventually made the pictures non-public on Flickr after conferring with my family members in Nigeria.
    3. around the same time I posted some pictures of my fiancée and I on my Windows Live Space and each picture now has a derogatory comment attached to it.

    At this point I've given up on posting personal pictures or diary like postings on the public Web. Facebook is now where I share pictures.

    When we first launched Windows Live Spaces, there was a lot of concern across the division when people realized that a significant portion of our user base was teenage girls who used the site to post personal details about themselves including pictures of themselves and friends. At the end we decided, like Facebook, that the default accessibility for content created by our teenage users (i.e. if they declare their age in their profile) would be for it to only be visible to people in their social network (i.e. Windows Live Messenger buddies and people in their Windows Live Spaces friends list). I think it is actually pretty slick that on Facebook, you can also create access control lists with entries like "anyone who's proved they work at Microsoft". 

  2. Inability to Export My Content from the Social Network: This is something that geeks complain about especially since they tend to join new social networking sites on a new basis but for the most part there isn't a lot of end user demand for this kind of functionality based on my experience working closely with the folks behind Windows Live Spaces and keeping an eye on feedback about other social networking sites. There are two main reasons for this, the first is that there is little value of having the content that is unique to the social network site outside of the service. For example, my friends list on Facebook is only useful in the context of that site. The only use for it outside the service would be for a way to bootstrap a new friends list by spamming all my friends on Facebook to tell them to join the new site.  Secondly, danah boyd has pointed out in her research that many young users of social networking sites consider their profiles to be ephemeral, to them not being able to just port your profile from MySpace to Facebook isn't a big deal because you're starting over anyway. For working professionals, things are a little different since they may have created content that has value outside the service (e.g. work-related blog postings related to their field of endeavor) so allowing data export in that context actually does serve a legitimate user need. 
  3. Full APIs for Extracting and Creating Content on the Social Network: With the growth in popularity and valuations of social networking sites, some companies have come to the conclusion that the there is an opportunity for making money by becoming meta-social network sites which aggregate a user's profiles and content from multiple social networking sites. There are literally dozens of Social Network Profile aggregators today and it is hard to imagine social networking sites viewing them as anything other than leeches trying to steal their page views by treating them as dumb storage systems. This is another reason why most social network services primarily focus on building widget platforms or APIs that enable you to create content or applications hosted within the site but don't give many ways to programmatically get content out.  

    Counter examples to this kind of thinking are Flickr and YouTube which both provide lots of ways to get content in and out of their service yet became two of the fastest growing and most admired websites in their respective categories. It is clear that a well-thought out API strategy that drives people to your site while not restricting your users combined with a great user experience on your website is a winning combination. Unfortunately, it's easier said than done.

  4. Being able to Interact with People from Different Social Networks from Your Preferred Social Network: I'm on Facebook and my fiancée is on MySpace. Wouldn't it be great if we could friend each other and send private messages without both being on the same service?

    It is likely that there is a lot of unvoiced demand for this functionality but it likely won't happen anytime soon for business reasons not technical ones. I suspect that the concept of "social network interop" will eventually mirror the current situation in the instant messaging world today.

    • We'll have two or three dominant social networking services with varying popularity in different global markets with a few local markets being dominated by local products.
    • There'll be little incentive for a dominant player to want to interoperate with smaller players. If interop happens it will be between players that are roughly the same size or have around the same market strength.
    • A small percentage of power users will use services that aggregate their profiles across social networks to get the benefits of social network interoperability. The dominant social networking sites will likely ignore these services unless they start getting too popular.
    • Corporate customers may be able to cut special deals so that their usage of public social networking services does interoperate with  whatever technology they use internally.

    Since I've assumed that some level of interoperability across social networking sites is inevitable, the question then is what is this functionality and what would the API/protocols look like? Good question.


 

Database normalization is a formal process of designing your database to eliminate redundant data, utilize space efficiently and reduce update errors. Anyone who has ever taken a database class has it drummed into their heads that a normalized database is the only way to go. This is true for the most part . However there are certain scenarios where the benefits of database normalization are outweighed by its costs. Two of these scenarios are described below.

Immutable Data and Append-Only Scenarios

Pat Helland, an enterprise architect at Microsoft who just rejoined the company after a two year stint at Amazon, has a blog post entitled Normalization is for Sissies where he presents his slides from an internal Microsoft gathering on database topics. In his presentation, Pat argues that database normalization is unnecessary in situations where we are storing immutable data such as financial transactions or a particular day's price list.

When Multiple Joins are Needed to Produce a Commonly Accessed View

The biggest problem with normalization is that you end up with multiple tables representing what is conceptually a single item. For example, consider this normalized set of tables which represent a user profile on a typical social networking site.

user table
user_id first_name last_name sex hometown relationship_status interested_in religious_views political_views
12345 John Doe Male Atlanta, GA married women (null) (null)
user_affiliations table
user_id (foreign_key) affiliation_id (foreign key)
12345 42
12345 598
affiliations table
affiliation_id description member_count
42 Microsoft 18,656
598 Georgia Tech 23,488
user_phone_numbers table
user_id (foreign_key) phone_number phone_type
12345 425-555-1203 Home
12345 425-555-6161 Work
12345 206-555-0932 Cell
user_screen_names table
user_id (foreign_key) screen_name im_service
12345 geeknproud@example.com AIM
12345 voip4life@example.org Skype
user_work_history table
user_id (foreign_key) company_affiliation_id (foreign key) company_name job_title
12345 42 Microsoft Program Manager
12345 78 i2 Technologies Quality Assurance Engineer

This is the kind of information you see on the average profile on Facebook. With the above design, it takes six SQL Join operations to access and display the information about a single user. This makes rendering the profile page a fairly database intensive operation which is compounded by the fact that profile pages are the most popular pages on social networking sites.

The simplest way to fix this problem is to denormalize the database. Instead of having tables for the user’s affiliations, phone numbers, IM addresses and so on, we can just place them in the user table as columns. The drawback with this approach is that there is now more wasted space (e.g. lots of college students people will have null for their work_phone)  and perhaps some redundant information (e.g. if we copy over the description of each affiliation into an affiliation_name column for each user to prevent having to do a join with the affiliations table). However given the very low costs of storage versus the improved performance characteristics of querying a single table and not having to deal with SQL statements that operate across six tables for every operation. This is a small price to pay.

As Joe Gregorio mentions in his blog post about the emergence of megadata, a lot of the large Web companies such as Google, eBay and Amazon are heavily into denormalizing their databases as well as eschewing transactions when updating these databases to improve their scalability.

Maybe normalization is for sissies…

UPDATE: Someone pointed out in the comments that denormalizing the affiliations table into user's table would mean the member_count would have to updated in thousands of user's rows when a new member was added to the group. This is obviously not the intent of denormalization for performance reasons since it replaces a bad problem with a worse one. Since an affiliation is a distinct concept from a user, it makes sense for it to have it's own table. Replicating the names of the groups a user is affiliated with in the user table is a good performance optimization although it does mean that the name has to be fixed up in thousands of tables if it ever changes. Since this is likely to happen very rarely, this is probably acceptable especially if we schedule renames to be done by a cron job during offpeak ours On the other hand, replicating the member count is just asking for trouble.

UPDATE 2: Lots of great comments here and on reddit indicate that I should have put more context around this post. Database denormalization is the kind of performance optimization that should be carried out as a last resort after trying things like creating database indexes, using SQL views and implementing application specific in-memory caching. However if you hit massive scale and are dealing with millions of queries a day across hundreds of millions to billions of records or have decided to go with database partitioning/sharding then you will likely end up resorting to denormalization. A real-world example of this is the Flickr database back-end whose details are described in Tim O'Reilly's Database War Stories #3: Flickr which contains the following quotes

tags are an interesting one. lots of the 'web 2.0' feature set doesn't fit well with traditional normalised db schema design. denormalization (or heavy caching) is the only way to generate a tag cloud in milliseconds for hundereds of millions of tags. you can cache stuff that's slow to generate, but if it's so expensive to generate that you can't ever regenerate that view without pegging a whole database server then it's not going to work (or you need dedicated servers to generate those views - some of our data views are calculated offline by dedicated processing clusters which save the results into mysql).

federating data also means denormalization is necessary - if we cut up data by user, where do we store data which relates to two users (such as a comment by one user on another user's photo). if we want to fetch it in the context of both user's, then we need to store it in both shards, or scan every shard for one of the views (which doesn't scale). we store alot of data twice, but then theres the issue of it going out of sync. we can avoid this to some extent with two-step transactions (open transaction 1, write commands, open transaction 2, write commands, commit 1st transaction if all is well, commit 2nd transaction if 1st commited) but there still a chance for failure when a box goes down during the 1st commit.

we need new tools to check data consistency across multiple shards, move data around shards and so on - a lot of the flickr code infrastructure deals with ensuring data is consistent and well balanced and finding and repairing it when it's not."

The part highlighted in red is also important to consider. Denormalization means that you you are now likely to deal with data inconsistencies because you are storing redundant copies of data and may not be able to update all copies of a column value simultaneously  when it is changed for a variety of reasons. Having tools in your infrastructure to support fixing up data of this sort then become very important.

Now playing: Bow Wow - Outta My System (feat. T-Pain)


 

Categories: Web Development

A number of people have been riffing on the how “Web 2.0” is the new vendor lock-in. The week started with a post by Alex Iskold entitled Towards the Attention Economy: Will Attention Silos Ever Open Up? where he wrote

At a quick glance there maybe nothing wrong with the way things are today. For example, you can login to Amazon and see your order history, you can see what you rented on Netflix or what you bought on eBay. The problem is that the information is not readily portable and not readily available via a common interface. Because of this, managing your attention information is practically impossible.

Consider a different industry - banking. Each bank makes your recent financial transactions exportable in a few formats - pdf, comma separated, Excel, etc. An export in Excel is actually an interesting example, because it illustrates how your information can be leveraged. By exporting information from your bank and credit card into Excel you are able to take it to your financial adviser who can in turn analyze it. The point is that your financial information is portable.

On the other hand your Netflix rental history is not. You can argue that it is possible to copy and paste it out of Netflix, but the cost of doing this is prohibitive for individuals.

Of course, not every “Web 2.0” company is like Netflix and some do provide APIs for getting out your data. But I think Mark Pilgrim has a great point in his post Let’s not and say we did where he writes

Praising companies for providing APIs to get your own data out is like praising auto companies for not filling your airbags with gravel. I’m not saying data export isn’t important, it’s just aiming kinda low. You mean when I give you data, you’ll… give it back to me? People who think this is the pinnacle of freedom aren’t really worth listening to. Please, we need a Free Data movement. (Yeah I know, Tim predicted it already. I was the one who told him, at FOO Camp the month before.)

Back in the day, I thought Steve Gillmor’s AttentionTrust was a step in the direction of a Free Data movement but since then all I’ve seen out of that crowd was either irrelevant (e.g. XML formats that replace OPML blogrolls) or ill-thought out (e.g. attempting to create "business opportunities" by forming companies which act as middle men who resold your data to the Amazon's and Netflix's of the world, kinda like Microsoft's Hailstorm vision). I keep wondering if we’ll ever see this Free Data movement. However there is another problem we have to face even if a Free Data movement does take hold.

In a follow up post to the piece by Alex Iskold entitled Attention mashups, Dave Winer gets to the heart of the matter in his characteristic blunt style when he writes

But whose data is it??  Permalink to this paragraph

Seems it belongs to the users and they should be able to take it where they want. Sure Yahoo is providing a recommendation engine, that's nice (and thanks), but they also get to use my data for their own purposes. Seems like a fair trade. And I'm a paying customer of Netflix. They just lowered the price but I'd much rather have gotten a dividend in the form of being able to use my own data.  Permalink to this paragraph

Think of the mashups that would be possible.  Permalink to this paragraph

Wouldn't it be great to link up Match.com with movie ratings to find dates that like the same movies?

One of the bitter truths about "Web 2.0" is that your data isn't all that interesting, our data on the other hand is very interesting. Dave Winer’s mash up example isn’t interesting because he wants to be able to get his data out of Netflix but because he wants to be able to combine his data with every body else’s data. This is where our “potential” Free Data movement will run into problems. The first is that a lot of “Web 2.0” websites provide value to their users via wisdom of the crowds appproaches such as tagging or  recommendations which are simply not possible with a single user’s data set or with a small set of users. This leads to a tendency for the rich to get richer because since they have the most data they provide the most value for end users (e.g. Amazon). Another problem is that social software leads to lock-in.  My buddy list on Windows Live Messenger and my list of friends in Facebook are useless to me outside the context of these applications. Although I can get all of my history and data out of these services, I lose the value I get from the fact that all my friends use these services as well. Again, my data isn’t what is interesting.

Being able to get your data out via APIs is a good first step but what is really interesting is being able to get everyone else’s data out of the service as well. Then we would have the beginings of truly open and free data on the Web which would lead to very, very interesting possibilities. 

Now playing: Rick Ross - Hustlin' (remix) (feat. Jay-Z & Young Jeezy)


 

Categories:

August 2, 2007
@ 02:40 AM

Yesterday, I was chatting with a former co-worker about Mary Jo Foley's article Could a startup beat Microsoft and Google to market with a ‘cloud OS’? and I pointed out that it was hard to make sense of the story because she seemed to be conflating multiple concepts then calling all of them a "cloud OS". It seems she isn’t the only one who throws around muddy definitions of this term as evidenced by C|Net articles like Cloud OS still pie in the sky and blog posts from Microsoft employees like Windows Cloud! What Would It Look Like!? 

I have no idea what Microsoft is actually working on in this space and even if I did I couldn't talk about it anyway. However I do think it is a good idea for people to have a clear idea of what they are talking about when the throw around terms like "cloud OS" or "cloud platform" so we don't end up with another useless term like SOA which means a different thing to each person who talks about it. Below are the three main ideas people often identify as a "Web OS", "cloud OS" or "cloud platform" and examples of companies executing on that vision.

WIMP Desktop Environment Implemented as a Rich Internet Application (The YouOS Strategy)

Porting the windows, icons, menu and pointer (WIMP) user interface which has defined desktop computing for the last three decades to the Web is seen by many logical extension of the desktop operating system. This is a throwback to the Oracle's network computer of the late 1990s where the expectation is that the average PC is not much more than a dumb terminal with enough horse power to handle the display requirements and computational needs of whatever rich internet application platform is needed to make this work.

A great example of a product in this space is YouOS. This seems to be the definition idea of a "cloud os" that is used by Ina Fried in the C|Net article Cloud OS still pie in the sky.

My thoughts on YouOS and applications like it were posted a year ago, my opinion hasn't changed since then.

Platform for Building Web-based Applications (The Amazon Strategy)

When you look at presentations on scaling popular websites like YouTube, Twitter, Flickr, eBay, etc it seems everyone keeps hitting the same problems and reinventing the same wheels. They all start of using LAMP thinking that’s the main platform decision they have to make. Then they eventually add on memcached or something similar to reduce disk I/O. After that, they may start to hit the limits of the capability of relational database management systems and may start taking data out of their databases, denormalizing them or simply repartition/reshard them as they add new machines or clusters. Then they realize that they now have dozens of machines in their data center when they started with one or two and managing them (i.e. patches, upgrades, hard disk crashes, dynamically adding new machines to the cluster, etc) becomes a problem.

Now what if someone who’d already built a massively scalable website and now had amassed a bunch of technologies and expertise at solving these problems decided to rent out access to their platform to startups and businesses who didn’t want to deal with a lot of the costs and complexities of building a popular Website beyond deciding whether to go with LAMP or WISC? That’s what Amazon has done with Amazon Web Services such as EC2 ,S3, SQS and the upcoming Dynamo.  

The same way a desktop operating system provides an abstraction over the complexity of interacting directly with the hardware is the way Amazon’s “cloud operating system” insulates Web developers from a lot of the concerns that currently plague Web development outside of actually writing the application code and dealing with support calls from their customers.

My thoughts on Amazon’s Web Services strategy remain the same. I think this is the future of Web platforms but there is still a long way to go for it to be attractive to today’s startup or business.

NOTE: Some people have commented that it is weird for an online retailer to get into this business. This belies a lack of knowledge of the company’s history. Amazon has always been about gaining expertise at some part of the Web retailer value chain then opening that up to others as a platform. Previous examples include the Amazon Honor System which treats their payment system as a platforn, Fulfillment by Amazon which treats their warehousing and product shipping system as a platform, zShops allows you to sell your products on their Website as well as more traditional co-branding deals where other sites reused their e-commerce platform such as Borders.com.

Web-based Applications and APIs for Integrating with Them (The Google Strategy)

Similar to Amazon, Google has created a rich set of tools and expertise at building and managing large scale websites. Unlike Amazon, Google has not indicated an interest in renting out these technologies and expertise to startups and businesses. Instead Google has focused on using their platform to give them a competitive advantage in the time to market, scalability and capabilities of their end user applications. Consider the following… 

If I use GMail for e-mail, Google Docs & Spreadsheets for my business documents, Google Calendar for my schedule, Google Talk for talking to my friends, Google search to find things on my desktop or on the Web and iGoogle as my start page when I get on the computer then it could be argued that for all intents and purposes my primary operating system was Google not Windows. Since every useful application eventually becomes a platform, Google’s Web-based applications are no exception. There is now a massive list of APIs for interacting and integrating with Google’s applications which make it easier to get data into Google’s services (e.g. the various GData APIs) or to spread the reach of Google’s services to sites they don’t control  (e.g. widgets like the Google AJAX Search API and the Google Maps API).

In his blog post GooOS, the Google Operating System Jason Kottke argues that the combination of Google’s various Web applications and APIs [especially if they include an office suite] plus some desktop and mobile entry points into their services is effectively a Google operating system. Considering Google’s recent Web office leanings and its bundling deals with Dell and Apple, it seems Jason Kottke was particularly prescient given that he wrote his blog post in 2004.

Now playing: Fabolous - Do The Damn Thing (feat. Young Jeezy)


 

It's been about a month and a half since we shipped the beta version of the next release of RSS Bandit codenamed ShadowCat. Since then we've been listening to our user feedback and have tracked down the major bugs that were causing significant instability in the application. After a lot of research and tons of negative feedback from our users we found out that the primary culprit for the significant increase in the number of crashes in the application in the most set of releases was a known issue in the Lucene search engine which powers our search feature. If we hear back from users who’ve complained that these crashes are no longer an issue, we’ll declare the release golden and start work on the Phoenix release.

You can grab the latest installer at RssBandit.1.5.0.15.ShadowCat.Beta.zip and let us know hear about your comments, complaints or kudos in the RSS Bandit forums

Major Bug Fixes Since the Previous ShadowCat beta

  • Random crashes due to error renaming file "deleteable.new" to "deletable" or "segments.new" to "segments" in search index folder.
  • Items in Atom 0.3 feeds that have a <created> date but no <issued> date show their date as the last modified date of the feed instead of the created date.
  • Images don't show up on certain items when clicking on feed or category view if the feed uses relative links such as in Tim Bray’s feed at http://www.tbray.org/ongoing/ongoing.atom.
  • Empty pages displayed in newspaper view when browsing multiple feeds under a category node.
  • Newly added feeds do not inherit the feed refresh rate specified in the Options dialog.
  • In certain cases, the following error message is displayed when attempting feed upload via FTP; "Feedlist upload failed with error: Passive mode not allowed on this server.."
  • Application crashes on startup with the COMException "unknown error"
  • None of the options when right-clicking on "This Feed" in feed properties is valid for newsgroups.
  • Crash because the application cannot modify the .treestate.xml configuration file
  • Crash when clicking on enclosure link in toast window

Now playing: Young Buck - I Know You Want Me (feat. Jazze Pha)


 

Categories: RSS Bandit

Via Mini-Microsoft I found the article Microsoft Investment Requires Too Much Patience - Barron's which contains the following excerpt

Some of the issues that worry analysts:

    1. It was clear from the presentation that many of the growth prospects will take 5-10 years to bear fruit.
    2. The company overspends ("nothing would delight analysts more than a nice big round of cost-cutting.")
    3. The businesses MSFT says it's entering (e.g. advertising and consumer electronics) are far more cut-throat than its current mix.
    4. Microsoft's focus on building internet infrastructure rather than building sites that bring in users is "backward."
    5. Bill Gates's plan to pass control of product development to Ray Ozzie "will not be a smooth one."

RE: Item #4, it seems weird for analysts to say Microsoft shouldn’t invest in building internet infrastructure but instead on building websites. What do they think you need to build the websites? It’s not like data centers are free.

Now playing: Kanye West - Stronger (feat. Daft Punk)


 

Categories: Life in the B0rg Cube