Monday, 26 May 2008 - Dare Obasanjo's weblog

May 26, 2008

@ 03:43 PM

Some Thoughts on Single Instance Storage and Twitter

In response to my recent post about Twitter's architectural woes, Robert Scoble penned a missive entitled Should services charge “super users”? where he writes

First of all, Twitter doesn’t store my Tweets 25,000 times. It stores them once and then it remixes them. This is like saying that Exchange stores each email once for each user. That’s totally not true and shows a lack of understanding how these things work internally.
Second of all, why can FriendFeed keep up with the ever increasing load? I have 10,945 friends on FriendFeed (all added in the past three months, which is MUCH faster growth than Twitter had) and it’s staying up just fine.

My original post excerpted a post by Isreal L'Heureux where he pointed out that there are two naive design choices one could make while designing a system like Twitter. When Robert Scoble posts a tweet you could either

PUSH the message to the queue’s of each of his ~~6,864~~ 24,875 followers, or
Wait for the ~~6,864~~ 24,875 followers to log in, then PULL the message.

Isreal assumed it was #1 which has its set of pros and cons. The pro is that fetching a users inbox is fast and the inbox can easily be cached. The con is that the write I/O is ridiculous. Robert Scoble's post excerpted above argues that Twitter must be using #2 since that is what messaging systems like Microsoft Exchange do.

The feature of Exchange that Robert is talking about is called Single Instance Storage and is described in a Microsoft knowledge base article 175706 as follows

If a message is sent to one recipient, and if the message is copied to 20 other recipients who reside in the same mailbox store, Exchange Server maintains only one copy of the message in its database. Exchange Server then creates pointers.
...
For example, assume the following configuration:
Server 1 Mailbox Store 1: Users A, B, and C
Server 2 Mailbox Store 1: User D
When User A sends the same message to User B, User C, and User D, Exchange Server creates a single instance of the message on server 1 for all three users, because User A, User B, and User C are on the same server. Even the message that is in User A's Sent Items folder is a single instance of the messages that is in the Inboxes of User B and User C. Because User D is on a different server, Exchange Server sends a message to server 2 that will be stored on that server.

So now we know how Exchange's single instance storage works we can talk about how it helps or hinders the model where users pull messages from all the people they are following when they log-in. A pull model assumes that you have a single instance of each message sent by a Twitter user and then when Robert Scoble logs in you execute a query of the form

SELECT TOP 20 messages.author_id, messages.body, messages.publish_date
FROM  messages
WHERE messages.author_id  in (SELECT following.id from following where user_id=$scoble_id)
ORDER BY messages.publish_date DESC;

in this model there's only ever one instance of a message and it lives in the messages table. Where does this break down? Well, what happens when you have more than one database per server or more than one set of database servers or more than one data center? This is why Exchange still has multiple message instances per mailbox store even if they do have single instances shared by all recipients of the message in each mailbox store on a single server.

How much does having multiple databases or multiple DB servers affect the benefits of single instancing of messages? I'll let the Exchange team answer that with their blog post from earlier this year titled Single Instance Storage in Exchange 2007

Over the years, the importance of single instance storage (SIS) in Exchange environments has gradually declined. Ten years ago, Exchange supported about 1000 users per server, with all users on a single database. Today, a typical mailbox server might have 5000 users spread over 50 databases. With one-tenth as many users per database (100 users vs. 1000 users), the potential space savings from single-instancing is reduced by 10x. The move to multiple databases per server, along with the reduction in space savings due to items being deleted over time, means that the space savings from SIS are quite small for most customers. Because of this, we've long recommend that customers ignore SIS when planning their storage requirements for Exchange.
...
Looking forward
Given current trends, we expect the value of single instance storage to continue to decline over time. It's too early for us to say whether SIS will be around in future versions of Exchange. However, we want everyone to understand that it is being deemphasized and should not be a primary factor in today's deployment or migration plans.

At the end of the day, the cost of a naive implementation of single instancing in a situation like Twitter's is latency and ridiculous read I/O. You will have to query data from tens of thousands of users across multiple database every time Robert Scoble logs in.

Exchange gets around the latency problem with the optimization of always having an instance of the message in each database. So you end up with a mix of both a pull & push model. If Scoble blasts a message out to 25,000 recipients on an Exchange system such as the one described in the blog post by the Exchange team, it will end up being written 250 times across 5 servers (100 users per database, 5000 users per server) instead of 25,000 times as would happen in a naive implementation of the PUSH model. Then when each person logs-in to view their message it is assembled on demand from the single instance store local to their mail server instead of trying to read from 250 databases across 5 servers as would happen in a naive implementation of the PULL model.

Scaling a service infrastructure is all about optimizations and trade offs. My original post wasn't meant to imply that Twitter's scaling problems are unsolvable since similar problems have obviously been solved by other online services and Enterprise products in the messaging space. It was meant to inject some perspective in the silly claims that Ruby on Rails is to blame and give readers some perspective on the kind of architectural problems that have to be solved.

Now Playing: Lil Wayne - Got Money (feat. T-Pain)

Categories: Web Development

May 26, 2008

@ 03:42 PM

Comments [2]

Having the Right Users is More Important than Having the Right Features

An important aspect of the growth of a social software application is how well it takes advantage of network effects. Not only should users get value from using the application, they should also get value from the fact that other people use the software as well. Typical examples of technologies that require network effects to even be useful are fax machines and email. Having a fax machine or an email address is useless if no one else has one and they get more useful as more people you communicate with get one. Of course, the opposite is also true and the value of your product declines more rapidly once your user base starts to shrink.

The important balancing act for social is learning how to take as much advantage of the network effects of their application as possible yet make sure the application still provides value even without the presence of network effects. Three examples of Web applications that have done this well are Flickr, MySpace and YouTube. Even without social features, Flickr is a great photo hosting site. However its usage of tagging [which encourages discovery of similar pictures] and social networking [where you get to see your friends photo streams] allows the site to benefit from network effects. Even without a social networking component, MySpace works well as a way for people and corporate brands to represent themselves online in what I'd like to think of as "Geocities 2.0". The same goes for YouTube.

All three of these sites had to face the technology adoption curve and cross the chasm between early adopters & pragmatists/conservatives. One thing that makes social Web sites a lot different from other kinds of technology is that potential customers can not only cheaply try them out but can also can easily see the benefits that others are getting out of the site. This means that it is very important to figure out how to expose the pragmatists and conservative technology users to how much value early adopters are getting from your service. There are still only two ways of really getting this to happen "word of mouth" and explicit advertising. In a world full of "Web 2.0" sites competing for people's attention advertising is not only expensive but often ineffective. Thus word of mouth is king.

So now that we've agreed that word of mouth is important, the next thing to wonder about is how to build it into your product? The truth is that you not only have to create a great product, you also have to create passionate users. Kathy Sierra had two great charts on her site which I've included below

What is important isn't building out your checklist of features, it is making sure you have the right features with the right user experience for your audience. Although that sounds obvious, I've lost count of the amount of churn I've seen in the industry as people react to some announcement by Google, Facebook, or some cool new startup without thinking about how it first into their user experience. When you are constantly adding features to your site without a cohesive vision as to how it all hangs together, your user experience will be crap. Which means you're unlikely to get users to the point where they love your product and are actively evangelizing it for free.

In addition to building a service that turns your users into evangelists, you want to acquire customers that would be effective evangelists especially if they naturally attract an audience. For example, I realized MySpace was going to blow up like a rocket ship full of TNT when every time I went out clubbing the DJ would have the URL of their MySpace page in a prominent place on the stage. I felt similarly about Facebook whenever I heard college students enthusiastically talking about using it as a cool place to hang out online with all their other college friends and alumni. What is interesting to note is how both sites grew from two naturally influential and highly connected demographics to become massively mainstream.

One thing to always remember is that Social Software isn't about features, it is about users. It isn't about one upping the competition in feature checklists it is about capturing the right audience then building an experience that those users can't help evangelizing.

Of course, you need to ensure that when their friends do try out the service they are almost immediately impressed. That is easier said than done. For example, I heard a lot of hype about Xobni and finally got to try it out after hearing lots of enthusiastic evangelism from multiple people. After spending several minutes indexing my inbox I got an app that took up the space of the To-Do sidebar in Outlook with a sidebar full of mostly pointless trivia about my email correspondence. Unfortunately I'd rather know where and when my next meeting is occurring than that I've been on over 1000 emails with Mike Torres. So out went Xobni. Don't let the same happen to your application, as Kathy's graph above says "How soon your users can start kicking ass" directly correlates with how passionate they end up being about your application.

PS: Some people will include viral marketing as a third alternative to traditional advertising and word of mouth. I personally question the effectiveness of this technique which is why I didn't include it above.

Now Playing: Fall Out Boy - I Slept With Someone In Fall Out Boy and All I Got Was This Stupid Song Written About Me

Categories: Social Software

May 23, 2008

@ 03:23 AM

Comments [9]

Some Thoughts on Twitter's Availability Problems

As a regular user of Twitter I've felt the waves of frustration wash over me these past couple of weeks as the service has been hit by one outage after another. This led me to start pondering the problem space [especially as it relates to what I'm currently working on at work] and deduce that the service must have some serious architectural flaws which have nothing to do with the reason usually thrown about by non-technical pundits (i.e. Ruby on Rails is to blame).

Some of my suspicions were confirmed by a recent post on the Twitter developer blog entitled Twittering about architecture which contains the following excerpt

Twitter is, fundamentally, a messaging system. Twitter was not architected as a messaging system, however. For expediency's sake, Twitter was built with technologies and practices that are more appropriate to a content management system. Over the last year and a half we've tried to make our system behave like a messaging system as much as possible, but that's introduced a great deal of complexity and unpredictability. When we're in crisis mode, adding more instrumentation to help us navigate the web of interdependencies in our current architecture is often our primary recourse. This is, clearly, not optimal.

Our direction going forward is to replace our existing system, component-by-component, with parts that are designed from the ground up to meet the requirements that have emerged as Twitter has grown.

Given that Twitter has some unique requirements that would put to test a number of off-the-shelf and even custom messaging applications, it is shocking that it isn't even architected as a messaging system. This makes sense when you consider the background of the founders was a blogging tool and they originally intended Twitter to be a "micro" blogging service.

If Twitter was simply a micro-content publishing tool with push notifications for SMS and IM then the team wouldn't be faulted for designing it as a Content Management System (CMS). In that case you'd just need three data structures

a persistent store for each users tweets
a cache of their tweets in memory to improve read performance
a persistent list of [IM and SMS] end points subscribed to each users tweets and an asynchronous job (i.e. a daemon) which publishes to each users subscribers after each post

Unfortunately, Twitter isn't just a blogging tool that allows people to subscribe to my posts via SMS & IM instead of just RSS. It also has the notion of followers. That's when things get hairy. Isreal over at AssetBar had a great post about this entitled Twitter-proxy: Any Interest? where he wrote

Consider the messaging problem:

Nothing is as easy as it looks. When Robert Scoble writes a simple “I’m hanging out with…” message, Twitter has about two choices of how they can dispatch that message:

PUSH the message to the queue’s of each of his ~~6,864~~ 24,875 followers, or

Wait for the ~~6,864~~ 24,875 followers to log in, then PULL the message.

The trouble with #2 is that people like Robert also follow ~~6,800~~ 21,146 people. And it’s unacceptable for him to login and then have to wait for the system to open records on ~~6,800~~ 21,146 people (across multiple db shards), then sort the records by date and finally render the data. Users would be hating on the HUGE latency.

So, the twitter model is almost certainly #1. Robert’s message is copied (or pre-fetched) to 6,864 users, so when those users open their page/client, Scoble’s message is right there, waiting for them. The users are loving the speed, but Twitter is hating on the writes. All of the writes.

How many writes?

A 6000X multiplication factor:

Do you see a scaling problem with this scenario?

Scoble writes something–boom–~~6,800~~ 21,146 writes are kicked off. 1 for each follower.

Michael Arrington replies–boom–another ~~6,600~~ 17,918 writes.

Jason Calacanis jumps in –boom–another ~~6,500~~ 25,972 writes.

Beyond the ~~19,900~~ 65,036 writes, there’s a lot of additional overhead too. You have to hit a DB to figure out who the ~~19,900~~ 65,036 followers are. Read, read, read. Then possibly hit another DB to find out which shard they live on. Read, read, read. Then you make a connection and write to that DB host, and on success, go back and mark the update as successful. Depending on the details of their messaging system, all the overhead of lookup and accounting could be an even bigger task than the ~~19,900~~ 65,036 reads + ~~19,900~~ 65,036 writes. Do you even want to think about the replication issues (multiply by 2 or 3)? Watch out for locking, too.

And here’s the kicker: that giant processing & delivery effort–possibly a combined ~~100K~~ 130K disk IOs– was caused by 3 users, each just sending one, tiny, 140 char message. How innocent it all seemed.

Not only does Isreal's post accurately describes the problem with the logical model for Twitter's "followers" feature, it looks like he may have also nailed the details of their implementation which would explain the significant issues they've had scaling the site. The problem is that if you naively implement a design that simply reflects the problem statement then you will be in disk I/O hell. It won't matter if you are using Ruby on Rails, Cobol on Cogs, C++ or hand coded assembly, the read & write load will kill you.

This leads me to my new mantra which I've stolen from Jim Gray via Pat Helland; DISK IS THE NEW TAPE.

In addition, the fact that the Twitter folks decided not to cap the number of followers or following may have saved them from the kind of flames that Robert Scoble sends at Facebook for having a 5000 friend limit but it also means that they not only have to deal with supporting users with thousands of followers but also users that follow thousands of users [both of which would be optimized in completely different ways]. Clearly a lot of feature decisions have been made on that product without thought to how they impact the scalability of the service.

PS: If this problem space sounds interesting to you, we're hiring. I'm specifically looking for good operations folks. Shoot me a resume at dareo@msft.com - replace msft.com with microsoft.com

Now Playing: Rick Ross - Maybach Music (featuring Jay-Z)

Categories: Web Development

May 21, 2008

@ 02:21 PM

Comments [12]

Note to Web 2.0 Companies: Early Adopters are not the Mass Market

If you work in the technology industry it pays to be familiar with the ideas from Geoffrey Moore's insightful book Crossing the Chasm. In the book he takes a look at the classic marketing bell curve that segments customers into Early Adopters, Pragmatists, Conservatives and Laggards then points out that there is a large chasm to cross when it comes to becoming popular beyond an initial set of early adopters. There is a good review of his ideas in Eric Sink's blog post entitled Act Your Age which is excerpted below

The people in your market segment are divided into four groups:

Early Adopters are risk takers who actually like to try new things.

Pragmatists might be willing to use new technology, if it's the only way to get their problem solved.

Conservatives dislike new technology and try to avoid it.

Laggards pride themselves on the fact that they are the last to try anything new.

This drawing reflects the fact that there is no smooth or logical transition between the Early Adopters and the Pragmatists. In between the Early Adopters and the Pragmatists there is a chasm. To successfully sell your product to the Pragmatists, you must "cross the chasm".

The knowledge that the needs of early adopters and those of the majority of your potential user base differ significantly is extremely important when building and marketing any technology product. A lot of companies have ended up either building the wrong product or focusing their product too narrowly because they listened too intently to their initial customer base without realizing that they were talking to early adopters.

The fact is that early adopters have different problems and needs from regular users. This is especially true when you compare the demographics of the Silicon Valley early adopter crowd which "Web 2.0" startups often try to court with the typical users of social software on the Web. In the few years I've been working on building Web applications, I've seen a number of technology trends and products that have been heralded as the next big thing by technology pundits which actually never broke into the mainstream because they don't solve the problems of regular Internet users. Here are some examples

Blog Search: A few years ago, blog search engines were all the rage. You had people like Marc Cuban talking up IceRocket and Robert Scoble harranguing Web search companies to build dedicated blog search engines. Since then the products in that space have either given up the ghost (e.g. PubSub, Feedster), turned out to be irrelevant (e.g. Technorati, IceRocket) or were sidelined (e.g. Google Blog Search, Yahoo! Blog Search). The problem with this product category is that except for journalists, marketers and ego surfing A-list bloggers there aren't many people who need a specialized feature set around searching blogs.
Social bookmarking: Although del.icio.us popularized a number of "Web 2.0" trends such as tagging, REST APIs and adding social features to a previously individual task, it has never really taken off as a mainstream product. According to the former VC behind the service it seems to have peaked at 2 million unique visitors last year and is now seeing about half that number of unique users. Compare that to Yahoo! bookmarks which was seeing 20 million active users a year and a half ago.
RSS Readers: I've lost track of all of the this is the year RSS goes mainstream articles I've read over the past few years. Although RSS has turned out to be a key technology which powers a number of interesting functionality behind the scenes (e.g. podcasting) actually subscribing and reading news feeds in an RSS reader has not become a mainstream activity of Web users. When you think about it, it is kind of obvious. The problem an RSS reader solves is "I read so many blogs and news sites on daily basis, I need a tool to help me keep them all straight". How many people who aren't enthusiastic early adopters (i) have this problem and (ii) think they need a tool to deal with it?

These are just the first three that came to mind. I'm sure readers can come up with more examples of their own. This isn't to say that all hyped "Web 2.0" sites haven't lived up to their promise. Flickr is an example of an early adopter hyped site that showed up sprinkled with "Web 2.0" goodness that has become a major part of the daily lives of tens of millions of people across the Web.

When you look at the list of top 50 sites in the U.S. by unique visitors it is interesting to note what common theme unites the recent "Web 2.0" entrants into that list. There are the social networking sites like MySpace and Facebook which harness the natural need of young people to express their individuality yet be part of social cliques. Then there are the sites which provide lots of flexible options that enable people to share their media with their friends, family or the general public such as Flickr and YouTube. Both sites also have figured out how to harness the work of the few to entertain and benefit the many as have Wikipedia and Digg as well. Then there are sites like Fling and AdultFriendFinder which seem to now get more traffic than the personal sites you see advertised on TV for obvious reasons.

However the one overriding theme is that all of these recent entrants is that they solve problems that everyone [or at least a large section of the populace] has. Everyone likes to communicate with their social circle. Everyone likes watching funny videos and looking at couple pics. Everyone wants to find information about topics they interested in or find out what's going on around them. Everybody wants to get laid.

If you are a Web 2.0 company in today's Web you really need to ask yourselves, "Are we solving a problem that everybody has or are we building a product for Robert Scoble?"

Now Playing: Three 6 Mafia - I'd Rather (feat. DJ Unk)

Categories: Social Software | Technology

May 21, 2008

@ 02:20 PM

Comments [10]

C# 3.0 Implicit Type Declarations: To var or not to var?

Recently we got a new contributor on RSS Bandit who uses the ReSharper refactoring tool. This has led to a number of changes to our code base due to suggestions from ReSharper. For the most part, I've considered these changes to be benign until recently. A few weeks ago, I grabbed a recent snap shot of the code from SVN and was surprised to see all the type declarations in method bodies replaced by implicitly typed variable declarations (aka the var keyword). Below is an excerpt of what some of the "refactored" code now look likes.

var selectedIsChild = NodeIsChildOf(TreeSelectedFeedsNode, selectedNode);
var isSmartOrAggregated = (selectedNode.Type == FeedNodeType.Finder ||
                                      selectedNode.Type == FeedNodeType.SmartFolder);
//mark all viewed stories as read 
if (listFeedItems.Items.Count > 0){
   listFeedItems.BeginUpdate();

   for (var i = 0; i < listFeedItems.Items.Count; i++){
       var lvi = listFeedItems.Items[i];
       var newsItem = (INewsItem) lvi.Key;

I found this change to be somewhat puzzling since while it may have shortened the code by a couple of characters on each line but at the cost of making the code less readable. For example, I can't tell what the type of the variable named lvi is just by looking at the code.

I did some searching online to see how the ReSharper team justified this refactoring "suggestion" and came across the blog post entitled ReSharper vs C# 3.0 - Implicitly Typed Locals by a member of the ReSharper team which contains the following excerpt

Some cases where it seems just fine to suggest var are:

New object creation expression: var dictionary = new Dictionary<int, string>();

Cast expression: var element = (IElement)obj;

Safe Cast expression: var element = obj as IElement;

Generic method call with explicit type arguments, when return type is generic: var manager = serviceProvider.GetService<IManager>()

Generic static method call or property with explicit type arguments, when return type is generic: var manager = Singleton<Manager>.Instance;

However, various code styles may need to suggest in other cases. For example, programming in functional style with small methods can benefit from suggesting every suitable local variable to be converted to var style. Or may be your project has IElement root interface and you just know that every variable with "element" name is IElement and you don't want explicit types for this case. Probably, any method with the name GetTreeNode() always return ITreeNode and you want vars for all such local variable.

Currently, we have two suggestions: one that suggests every suitable explicitly typed local to be converted to implicitly typed var, and another that suggests according to rules above.

So it seems there are two suggestion modes. The first suggests using the var keyword when the name of the type is obvious from the right hand side expression being evaluated such as casts or new object creation. The second mode suggests replacing type declarations with the var keyword anywhere the compiler can infer the type [which is pretty much everywhere except for a few edge cases such as when you want to initialize a variable with null at declaration].

The first suggestion mode makes sense to me since the code doesn't lose any clarity and it makes for shorter code. The second mode is the one I find problematic it takes information out of the equation to save a couple of characters per line of code. Each time someone is reading the code, they need to resort to using Go To Definition or Auto-Complete features of their IDE to tell something as simple as "what is the type of this object".

Unfortunately, the ReSharper developers seem to have dug in their heels religiously on this topic as can be seen in the post entitled Varification -- Using Implicitly Typed Locals where a number of arguments are made justifying always using implicitly typed variables including

It induces better naming for local variables.
It induces better API.
It induces variable initialization.
It removes code noise
It doesn't require using directive.

It's interesting how not only are almost all of these "benefits" mainly stylistic but how they contradict each other. For example, the claim that it leads to "better naming for local variables" really means it compels developers to use LONGER HUNGARIAN STYLE VARIABLE NAMES. Funny enough, these long variable names add more noise to the code overall since they show up everywhere the variable is used compared to a single type name showing up when the variable is declared. The argument that it leads to "better API" is another variation of this theme since it argues that if you are compelled to use LONGER MORE DESCRIPTIVE PROPERTY NAMES (e.g. XmlNode.XmlNodeName instead of XmlNode.Name) then this is an improvement. Someone should inform the ReSharper folks that encoding type information in variable names sucks, that's why we're using a strongly typed programming language like C# in the first place.

One more thing, the claim that it encourages variable initialization is weird given that the C# compiler already enforces that. More importantly, the common scenario of initializing a variable to null before it is used isn't supported by the var keyword.

In conclusion, it seems to me that someone on the ReSharper team went overboard in wanting to support the new features in C# 3.0 in their product even though in some cases using these features cause more problems than they solve. Amusingly enough, the C# 3.0 language designers foresaw this problem and put the following note in the C# language reference

Remarks: Overuse of var can make source code less readable for others. It is recommended to use var only when it is necessary, that is, when the variable will be used to store an anonymous type or a collection of anonymous types.

Case closed.

Now Playing: Mariah Carey - Side Effects (featuring Young Jeezy)

Categories: Programming

May 17, 2008

@ 12:52 PM

Comments [3]

Two Key Issues that often Hinder Collaboration Between Teams in Large Companies

I've spent all of my professional career working at a large multinational company. In this time I've been involved in lots of different cross-team and cross-divisional collaboration efforts. Some times these groups were in the same organization and other times you would have to go up five to ten levels up the org chart before you found a shared manager. Surprisingly, the presence or lack of shared management has never been the key factor that has helped or hindered such collaborative efforts.

Of all the problems I've seen when I've had to depend on other teams for help in getting a task accomplished or vice versa; there have been two insidious that tend to crop up in situations where things go awry. The first is misaligned goals. Just because two groups are working together doesn't mean they have the same motivations or expected end results. Things quickly go awry when one group's primary goals either run counter to the goal(s) of the group they are supposed to be collaborating with. For example, consider a company that requires its technical support to have very low average call time to meet their metrics. Imagine that same company also puts together a task force to improve the customer satisfaction with the technical support experience after lots of complaints from their customers. What are the chances that the task force will be able to effect positive change if the metrics used to reward their tech support staff remain the same? The funny thing is that large companies often end up creating groups that are working at cross purposes yet are supposed to be working together.

What makes misaligned goals so insidious is that the members of the collaborating groups who are working through the project often don't realize that the problem is that their goals are misaligned. A lot of the time people tend to think the problem is the other group is evil, a bunch of jerks or just plain selfish. The truth is often that the so-called jerks are really just thinking You're not my manager, so I'm not going to ask how high when you tell me to jump. Once you find out you've hit this problem then the path to solving it is clear. You either have to (i) make sure all collaborating parties want to reach the same outcome and place have similar priorities or (ii) jettison the collaboration effort.

Another problem that has scuttled many a collaboration effort is when one or more of the parties involved has undisclosed concerns about the risks of collaborating which prevents them from entering into the collaboration wholeheartedly or even worse has them actively working against it. Software development teams experience this when they have to manage dependences on their project or that they have on other projects. There's a good paper on the topic entitled Managing Cognitive and Affective Trust in the Conceptual R&D Organization by Diane H. Sonnenwald which breaks down the problem of distrust in conceptual organizations (aka virtual teams) in the following way

Two Types of Trust and Distrust: Cognitive and Affective
Two types of trust, cognitive and affective, have been identified as important in organizations (McAllister, 1995; Rocco, et al, 2001). Cognitive trust focuses on judgments of competence and reliability. Can a co-worker complete a task? Will the results be of sufficient quality? Will the task be completed on time? These are issues that comprise cognitive trust and distrust. The more strongly one believes the answers to these types of questions are affirmative, the stronger one’s cognitive trust. The more strongly one believes the answers to these types of questions are negative, the stronger one’s cognitive distrust.

Affective trust focuses on interpersonal bonds among individuals and institutions, including perceptions of colleagues’ motivation, intentions, ethics and citizenship. Affective trust typically emerges from repeated interactions among individuals, and experiences of reciprocated interpersonal care and concern (Rosseau, et al, 1998). It is also referred to as emotional trust (Rocco, et al, 2001) and relational trust (Rosseau, et al, 1998). It can be “the grease that turns the wheel” (Sonnenwald, 1996).

The issue of affective distrust is strongly related to lacking shared goals while working together as a team which I've already discussed. Cognitive distrust typically results in one or more parties in the collaboration acting with the assumption that the collaboration is going to fail. Since these distrusting group(s) assume failure will be the end result of the collaboration they will take steps to insulate themselves from this failure. However what makes this problem insidious is that the "untrusted" groups are often not formally confronted about the lack of trust in their efforts and thus risk mitigation is not formally built into the collaboration effort. Eventually this leads to behavior that is counterproductive to the collaboration as teams try to mitigate risks in isolation and eventually there is distrust between all parties in the collaboration. Project failure often soon follows.

The best way to prevent this from happening once you find yourself in this situation is to put everyone's concerns on the table. Once the concerns are on the table, be they concerns about product quality, timelines or any of the other myriad issues that impact collaboration, mitigations can be put in place. As the saying goes sunlight is the best disinfectant, thus I've also seen that when the "distrusted" team becomes fully transparent in their workings and information disclosure it quickly makes matters clear. Because one of two things will happen; it will either (i) reassure their dependents that their fears are unfounded or (ii) confirm their concerns in a timely fashion. Either of which is preferable to the status quo.

Now Playing: Mariah Carey - Cruise Control (featuring Damian Marley)

Categories: Life in the B0rg Cube

May 17, 2008

@ 12:52 PM

Comments [13]

Offline Web Apps, Dumb Idea or Really Dumb Idea?

Lots of "Web 2.0" pundits like to argue that it is just a matter of time before Web applications make desktop applications obsolete and irrelevant. To many of these pundits the final frontier is the ability to take Web applications offline. Once this happens you get the best of both worlds, the zero install hassle, collaborative nature of Web-based applications married to the ability to take your "apps on a plane". Much attention has been given to this problem which has led to the rise of a number of frameworks designed bring offline capabilities to Web applications the most popular of which is Google Gears. I think the anti-Microsoft sentiment that courses through the "Web 2.0" crowd has created an unwanted solution to a problem that most users don't really have.

Unlike David Heinemeier Hansson in his rant You're not on a fucking plane (and if you are, it doesn't matter)!, I actually think the "offline problem" is a valid problem that we have to solve. However I think that trying to tackle it from the perspective of taking an AJAX application offline is backwards. There are a few reasons I believe this

The user experience of a "rich" Web application pales in comparison to that of a desktop application. If given a choice of using a desktop app and a Web application with the same features, I'd use a desktop application in a heart beat.
The amount of work it takes to "offline enable" a Web application is roughly similar to the amount of work it takes to "online enable" a desktop application. The amount of work it took me to make RSS Bandit a desktop client for Google Reader is roughly equivalent to what it most likely took to add offline reading to Google Reader.
Once you decide to "go offline", your Web application is no longer "zero install" so it isn't much different from a desktop application.

I suspect this is the bitter truth that answers the questions asked in articles like The Frustratingly Unfulfilled Promise of Google Gears where the author laments the lack of proliferation of offline Web applications built on Google Gears.

When it first shipped I was looking forward to a platform like Google Gears but after I thought about the problem for a while, I realized that such a platform would be just as useful for "online enabling" desktop applications as it would be for "offline enabling" Web applications. Additionally, I came to the conclusion that the former is a lot more enabling to users than the latter. This is when I started becoming interested in Live Mesh as a Platform, this is one area where I think Microsoft's hearts and minds are in the right place. I want to see more applications like Outlook + RPC over HTTP not "offline enabled" versions of Outlook Web Access.

Now Playing: Jordin Sparks - No Air (feat. Chris Brown)

Categories: Web Development

May 17, 2008

@ 12:48 PM

Comments [21]

Some Thoughts on Facebook Connect, Google Friend Connect and MySpace Data Availability

Disclaimer: This post does not reflect the opinions, thoughts, strategies or future intentions of my employer. These are solely my personal opinions. If you are seeking official position statements from Microsoft, please go here.

Recently there were three vaporware announcements by Facebook, Google and MySpace each describing a way for other web sites to integrate the user profiles and friends lists from these popular social networking sites. Given that I'm a big fan of social networking sites and interoperability between them, this seemed like an interesting set of announcements. So I decided to take a look at these announcements especially given the timing of them.

What Do They Have in Common?

Marc Canter does a good job of describing the underlying theme behind all three announcements in his post I do not compromise where he writes

three announcements that happened within a week of each other: MySpace’s Data Availability, Facebook’s Connect and Google’s Friend Connect - ALL THREE had fundamentally the same strategy!

They’re all keeping their member’s data on their servers, while sending out tentacles to mesh in with as many outside sites as they can. These tentacles may be widgets, apps or iFrames - but its all the same strategy.

Basically all three announcements argue that instead of trying to build social networking into their services from scratch, Web sites should instead outsource their social graphs and "social features" such as user profiles, friends lists and media sharing from the large social networking sites like Facebook, MySpace and Orkut.

This isn't a new pitch, Facebook has been singing the same song since they announced the beta of the Facebook Platform in August 2006 and Google has been sending Kevin Marks to every conference they can find to give his Social Cloud presentation which makes the same pitch. The new wrinkle to this time worn tale is that Google and Facebook [along with MySpace] are no longer just pitching using REST APIs for integration but are now preaching "no coding required" integration via widgets.

Now that we know the meat of all three announcements we can go over the little specifics that have leaked out about each forthcoming product thus far.

Facebook Connect

Dave Morin gave the first official statement about Facebook Connect news in his blog post Announcing Facebook Connect where he wrote

Trusted Authentication
Users will be able to connect their Facebook account with any partner website using a trusted authentication method. Whether at login, or anywhere else a developer would like to add social context, the user will be able to authenticate and connect their account in a trusted environment. The user will have total control of the permissions granted.

Real Identity
Facebook users represent themselves with their real names and real identities. With Facebook Connect, users can bring their real identity information with them wherever they go on the Web, including: basic profile information, profile picture, name, friends, photos, events, groups, and more.

Friends Access
Users count on Facebook to stay connected to their friends and family. With Facebook Connect, users can take their friends with them wherever they go on the Web. Developers will be able to add rich social context to their websites. Developers will even be able to dynamically show which of their Facebook friends already have accounts on their sites.

Dynamic Privacy
As a user moves around the open Web, their privacy settings will follow, ensuring that users' information and privacy rules are always up-to-date. For example, if a user changes their profile picture, or removes a friend connection, this will be automatically updated in the external website.

The key features to note are (i) a user can associate their Facebook account with their account on a 3rd party site which means (ii) the user's profile and media shared on Facebook can now be exposed on the 3rd party site and (iii) the users friends' on Facebook who have also associated their Facebook account with their account on the 3rd party site will show up as the user's friends on the site.

The "dynamic privacy" claim seems pretty vague if not downright empty. All that is stated above is that the user's changes on Facebook are instantly reflected on 3rd party sites. Duh. Does that need to be called out as a feature?

Google Friend Connect

On the Google Friend Connect page there is the following video

The key features to note are (i) a user can associate their ~~Facebook account~~ OpenID with their account on a 3rd party site which means (ii) the user's profile and media shared on ~~Facebook account~~ a small set of social networking site can now be exposed on the 3rd party site and (iii) the users friends' on ~~Facebook~~ the small set of social network sites who have also associated their ~~Facebook account~~ OpenID using Google Friend Connect to connect their account on the 3rd party site will show up as the user's friends on the site (iv) the user's activities on the 3rd party site are broadcast in her friends' news feeds.

One interesting thing about Google Friend Connect's use of OpenID is that it allows me to associate multiple social network profiles to a single account which may not even be from a social networking site (e.g. using my AOL or Y! email to sign-in but associating it with my Facebook profile & friend list).

Google Friend Connect seems to be powered by Google OpenSocial which is Google's attempt to commoditize the functionality of the Facebook platform by making it easy for any social networking site to roll its own Facebook-style platform by using Google's standard set of REST APIs, Javascript libraries and/or hosting services. In the above video, it is mentioned that Web sites which adopt Google Friend Connect will not only be able to obtain user profile and friend list widgets from Google but also OpenSocial widgets written by 3rd party developers. However since Facebook announced the JavaScript Client Library for Facebook API way back in January they already have the technology in place to offer something similar to Web site owners if this capability becomes in demand. More important will be the set of functionality that comes "out of the box" so to speak since a developer community won't form until Google Friend Connect gains traction.

By the way, it turns out that Facebook has banned Google from interacting with their user data using Google Friend Connect since it violates their terms of service. My assumption is that the problem is Google Friend Connect works by building an OpenSocial wrapper on top of the Facebook API and then exposing it to other web sites as widgets and to OpenSocial gadget developers via APIs. Thus Google is pretty much proxying the Facebook social graph to other sites and developers which takes control of safeguarding/policing access to this user data out of Facebook's hands. Not good for Facebook.

MySpace Data Availability

The only details on the Web about MySpace's Data Availability seems to be second hand data from tech bloggers who were either strategically leaked some details/screenshots or took part in a press release conference call. The best source I found was Mike Arrington's TechCrunch post entitled MySpace Embraces DataPortability, Partners With Yahoo, Ebay And Twitter which contains the following excerpt

MySpace is announcing a broad ranging embrace of data portability standards today, along with data sharing partnerships with Yahoo, Ebay, Twitter and their own Photobucket subsidiary. The new project is being called MySpace “Data Availability” and is an example, MySpace says, of their dedication to playing nice with the rest of the Internet.

A mockup of how the data sharing will look in action with Twitter is shown above. MySpace is essentially making key user data, including (1) Publicly available basic profile information, (2) MySpace photos, (3) MySpaceTV videos, and (4) friend networks, available to partners via their (previousy internal) RESTful API, along with user authentication via OAuth.

The key goal is to allow users to maintain key personal data at sites like MySpace and not have it be locked up in an island. Previously users could turn much of this data into widgets and add them to third party sites. But that doesn’t bridge the gap between independent, autonomous websites, MySpace says. Every site remains an island.

But with Data Availability, partners will be able to access MySpace user data, combine it with their own, and present it on their sites outside of the normal widget framework. Friends lists can be syncronized, for example. Or Twitter may use the data to recommend other Twitter users who are your MySpace friends.

The key difference between MySpace's announcement and those of Facebook & Google is that MySpace has more ground to cover. Since Facebook & Google already have REST APIs that support a delegated authentication model, MySpace is pretty much playing catch up here.

In fact, on careful rereading it seems MySpace's announcement isn't like the others since the only concrete technology announced above is a REST API that uses a user-centric delegated authentication model which is something both Google and Facebook have had for years (see GData/OpenSocial and Facebook REST API).

Given my assumption that MySpace is not announcing anything new to the industry, the rest of this post will focus on Google Friend Connect and Facebook Connect.

The Chicken & Egg Problem

When it comes to social networking, it is all about network effects. A social networking feature or site is only interesting to me if my friends are using it as well.

The argument that a site is better off using a user's social graph from a big social networking site like Facebook instead of building their own social network features only makes sense if (i) there is enough overlap in the user's friends list on Facebook and that on the site AND (ii) the user's friends on the site who are also his friends on Facebook can be discovered by the user. The latter is the tough part and one I haven't seen a good way of bridging without resorting to anti-patterns (i.e. pull the email addresses of all of the user's friends from Facebook and then cross-reference with the email addresses of the sites users). This anti-pattern works when you are getting the email addresses the user entered by hand from some Webmail address book (e.g. Hotmail, Gmail, Y! mail, etc).

However since Google and Facebook are going with a no-code solution, the only way to tell which of my Facebook friends also use the 3rd site is if they have also opted-in to linking their account on the site with their Facebook profile. This significantly weakens the network effects of the feature compared to the find your friends on your favorite "Web 2.0" site which a lot of sites have used to grow their user base by screen scraping Webmail address books then cross referencing it with their user databases.

How Does this Relate to Data Portability and Social Network Interoperability?

Short answer; it doesn't.

Long answer; the first thing to do is to make sure you understand what is meant by Data Portability and Social Network Interoperability. The difference between Data Portability and Social Network Interoperability is the difference between being able to export your email inbox and address book from Gmail into Outlook or vice versa (portable) and being able to send an email from a Gmail address to someone using Outlook or Hotmail (interoperable).

So do these new widget initiatives help portability? Nope. Widgets give developers less options for obtaining and interacting with the user data than APIs. With Facebook's REST API, I know how to get my friends list with profile data into Outlook and my Windows Mobile phone via OutSync. I would actually lose that functionality if it was only exposed via a widget. The one thing they do is lower the bar for integration by people who don't know how to code.

Well, how about interoperability? The idea of social network interoperability is that instead of being a bunch of walled gardens and data silos, social networking sites can talk to each other the same way email services and [some] IM services can talk to each other today. The "Use our data silo instead of building your own" pitch may reduce the number of data silos but it doesn't change the fact that the Facebooks and MySpaces of the world are still fundamentally data silos when it comes to the social graph. That is what we have to change. Instead we keep getting distracted along the way by shiny widgets.

PS: The blog hiatus is over. It was fun while it lasted. ;)

Now Playing: Fugees (Refugee Camp) - Killing Me Softly

Categories: Competitors/Web Companies | Social Software

March 5, 2008

@ 04:00 AM

Comments [0]

Indefinite Hiatus

I’ve been writing a personal weblog for almost seven years. It’s weird to go back and read some of the posts in my old kuro5hin diary such as my early postings about interning at Microsoft and see how much my perspectives have changed in some ways and stayed the same in others. Anyway…

Although I’ve found this weblog to be personally fulfilling, the time has come for me to put it aside for the time being. This will be the last post on http://www.25hoursaday.com/weblog.

In addition, I’ll be cleaning up my Twitter and Facebook profiles by removing anyone who I haven’t personally met from my list of followers and friends respectively.

I will continue to work on and blog about RSS Bandit. I haven’t yet picked a location for a new blog for the project. However this shouldn’t impact subscribers to my RSS Bandit feed since it is already hosted on Feedburner and a redirect shouldn’t be noticeable.

Thanks for everything.

PS: See also The Year the Blog Died.

Now playing: Boyz II Men - End of the Road

Categories: Life in the B0rg Cube | Personal

March 4, 2008

@ 04:00 AM

Comments [7]

IE 8 Will Do the Right Thing

Dean Hachamovitch who runs the Internet Explorer team has a blog post entitled Microsoft's Interoperability Principles and IE8 where he addresses some of the recent controversy about how rendering pages according to Web standards will work in IE 8. He wrote

The Technical Challenge

One issue we heard repeatedly during the IE7 beta concerned sites that looked fine in IE6 but looked bad in IE7. The reason was that the sites had worked around IE6 issues with content that – when viewed with IE7’s improved Standards mode – looked bad.

As we started work on IE8, we thought that the same thing would happen in the short term: when a site hands IE8 content and asks for Standards mode, that content would expect IE7’s Standards mode and not appear or function correctly.

In other words, the technical challenge here is how can IE determine whether a site’s content expects IE8’s Standards mode or IE7’s Standards mode? Given how many sites offer IE very different content today, which should IE8 default to?

Our initial thinking for IE8 involved showing pages requesting “Standards” mode in an IE7’s “Standards” mode, and requiring developers to ask for IE8’s actual “Standards” mode separately. We made this decision, informed by discussions with some leading web experts, with compatibility at the top of mind.

In light of the Interoperability Principles, as well as feedback from the community, we’re choosing differently. Now, IE8 will show pages requesting “Standards” mode in IE8’s Standards mode. Developers who want their pages shown using IE8’s “IE7 Standards mode” will need to request that explicitly (using the http header/meta tag approach described here).

Going Forward

Long term, we believe this is the right thing for the web.

I’m glad someone came to this realization. The original solution was simply unworkable in the long term regardless of how much short term pain it eased. Kudos to the Internet Explorer team for taking the long view and doing what is best for the Web. Is it me or is that the most positive the comments have ever been on the IE blog?

PS: It is interesting to note that this is the second time in the past week Microsoft has announced a technology direction related to Web standards and changed it based on feedback from the community.

Now playing: Usher - In This Club (feat. Young Jeezy)

Categories: Web Development

Dare Obasanjo's weblog

"You can buy cars but you can't buy respect in the hood" - Curtis Jackson

Navigation for Monday, 26 May 2008 - Dare Obasanjo's weblog

Consider the messaging problem:

A 6000X multiplication factor:

What Do They Have in Common?

Facebook Connect

Google Friend Connect

MySpace Data Availability

The Chicken & Egg Problem

How Does this Relate to Data Portability and Social Network Interoperability?