Friday, 30 May 2008 - Dare Obasanjo's weblog

May 30, 2008

@ 04:21 PM

When Google Gears was first announced, it was praised as the final nail in the coffin for desktop applications as it now made it possible to take Web applications offline. However in the past year that Gears has existed, there hasn't been as much progress or enthusiasm in taking applications offline as was initially thought when Gears was announced. There are various reasons for this and since I've already given my thoughts on taking Web applications offline so I won't repeat myself in this post. What I do find interesting is that many proponents of Google Gears including technology evangelists at Google have been gradually switching to pushing Gears as a way to route around browsers and add features to the Web, as opposed to just being about an 'offline solution'. Below are some posts from the past couple of months showing this gradual switch in positioning.

Alex Russell of the Dojo Toolkit wrote a blog post entitled Progress Is N+1 in March of this year that contained the following excerpt

Every browser that we depend on either needs an open development process or it needs to have a public plan for N+1. The goal is to ensure that the market knows that there is momentum and a vehicle+timeline for progress. When that’s not possible or available, it becomes incumbent on us to support alternate schemes to rev the web faster. Google Gears is our best hope for this right now, and at the same time as we’re encouraging browser venders to do the right thing, we should also be championing the small, brave Open Source team that is bringing us a viable Plan B. Every webdev should be building Gear-specific features into their app today, not because Gears is the only way to get something in an app, but because in lieu of a real roadmap from Microsoft, Gears is our best chance at getting great things into the fabric of the web faster. If the IE team doesn’t produce a roadmap, we’ll be staring down a long flush-out cycle to replace it with other browsers. The genius of Gears is that it can augment existing browsers instead of replacing them wholesale. Gears targets the platform/product split and gives us a platform story even when we’re neglected by the browser vendors.
Gears has an open product development process, an auto-upgrade plan, and a plan for N+1.
At this point in the webs evolution, I’m glad to see browser vendors competing and I still feel like that’s our best long-term hope. But we’ve been left at the altar before, and the IE team isn’t giving us lots of reasons to trust them as platform vendors (not as product vendors). For once, we have an open, viable Plan B.
Gears is real, bankable progress.

This was followed up by a post by Dion Almaer who's a technical evangelist at Google who wrote the following in his post Gears as a bleeding-edge HTML 5 implementation

I do not see HTML 5 as competition for Gears at all. I am sitting a few feet away from Hixie’s desk as I write this, and he and the Gears team have a good communication loop.
There is a lot in common between Gears and HTML 5. Both are moving the Web forward, something that we really need to accelerate. Both have APIs to make the Web do new tricks. However HTML 5 is a specification, and Gears is an implementation.
...
Standards bodies are not a place to innovate, else you end up with EJB and the like.
...
Gears is a battle hardened Web update mechanism, that is open source and ready for anyone to join and take in interesting directions.

and what do Web developers actually think about using Google's technology as a way to "upgrade the Web" instead of relying on Web browsers and standards bodies for the next generation of features for the Web? Here's one answer from Matt Mullenweg, founder of WordPress, taken from his post Infrastructure as Competitive Advantage

When a website “pops” it probably has very little to do with their underlying server infrastructure and a lot to do with the perceived performance largely driven by how it’s coded at the HTML, CSS, and Javascript level. This, incidentally, is one of the reasons Google Gears is going to change the web as we know it today - LocalServer will obsolete CDNs as we know them. (Look for this in WordPress soonish.)

That's a rather bold claim (pun intended) by Matt. If you're wondering what features Matt is adding to WordPress that will depend on Gears, they were recently discussed in Dion Almaer's post Speed Up! with Wordpress and Gears which is excerpted below

However, Gears is so much more than offline, and it is really exciting to see “Speed Up!” as a link instead of “Go Offline?”
This is just the beginning. As the Gears community fills in the gaps in the Web development model and begins to bring you HTML5 functionality I expect to see less “Go Offline” and more “Speed Up!” and other such phrases. In fact, I will be most excited when I don’t see any such linkage, and the applications are just better.
With an embedded database, local server storage, worker pool execution, desktop APIs, and other exciting modules such as notifications, resumable HTTP being talked about in the community…. I think we can all get excited.

Remember all those rumors back in the day that Google was working on their own browser? Well they've gone one better and are working on the next Flash. Adobe likes pointing out that Flash has more market share than any single browser and we all know that has Flash has gone above and beyond the [X]HTML standards bodies to extend the Web thus powering popular, rich user experiences that weren't possible otherwise (e.g. YouTube). Google is on the road to doing the same thing with Gears. And just like social networks and content sharing sites were a big part in making Flash an integral part of the Web experience for a majority of Web users, Google is learning from history with Gears as can be seen by the the recent announcements from MySpace. I expect we'll soon see Google leverage the popularity of YouTube as another vector to spread Google Gears.

So far none of the Web sites promoting Google Gears have required it which will limit its uptake. Flash got ahead by being necessary for sites to even work. It will be interesting to see if or when sites move beyond using Gears for nice-to-have features and start requiring it to function. It sounds crazy but I never would have expected to see sites that would be completely broken if Flash wasn't installed five years ago but it isn't surprising today (e.g. YouTube).

PS: For those who are not familiar with the technical details of Google Gears, it currently provides three main bits of functionality; thread pools for asynchronous operations, access to a SQL database running on the user's computer, and access to the user's file system for storing documents, images and other media. There are also beta APIs which provide more access to the user's computer from the browser such as the Desktop API which allows applications to create shortcuts on the user's desktop.

Now Playing: Nas - It Ain't Hard To Tell

Categories: Platforms | Web Development

May 30, 2008

@ 04:19 PM

Comments [3]

Participation as Social Capital: The Fundamental Flaw of Social News Sites

I started thinking about the problems inherent in social news sites recently due to a roundabout set of circumstances. Jeff Atwood wrote a blog post entitled It's Clay Shirky's Internet, We Just Live In It which linked to a post he made in 2005 titled A Group Is Its Own Worst Enemy which linked to my post on the issues I'd seen with kuro5hin, an early social news site which attempted to "fix" the problems with Slashdot's model but lost its technology focus along the way.

A key fact about online [and offline] communities is that the people who participate the most eventually influence the direction and culture of the community. Kuro5hin tried to fix two problems with Slashdot, the lack of a democratic voting system and the focus on mindless link propagation instead of deeper, more analytical articles. I mentioned how this experiment ended up in my post Jeff linked to which is excerpted below

Now five years later, I still read Slashdot every day but only check K5 out every couple of months out of morbid curiosity. The democracy of K5 caused two things to happen that tended to drive away the original audience. The first was that the focus of the site ended up not being about technology mainly because it is harder for people to write technology articles than write about everyday topics that are nearer and dearer to their hearts. Another was that there was a steady influx of malicious users who eventually drove away a significant proportion of K5's original community, many of whom migrated to HuSi. This issue is lamented all the time on K5 in comments such as an exercise for rusty and the editors. and You don't understand the nature of what happened.
Besides the malicious users one of the other interesting problems we had on K5 was that the number of people who actually did things like rate comments was very small relative to the number of users on the site. Anytime proposals came up for ways to fix these issues, there would often be someone who disregarded the idea by stating that we were "seeking a technical solution to a social problem". This interaction between technology and social behavior was the first time I really thought about social software.

The common theme underscoring both problems that hit the site is that they are all due to the cost of participation. It is easier to participate if you are writing about politics during an election year than if you have to write some technical article about the feasibility of adding garbage collection to C++ or analysis of distributed computing technologies. So users followed the path of least resistance. Similarly, cliques of malicious users and trolls have lots of time on their hands by definition and Kuro5hin never found a good way to blunt their influence. Slashdot's system of strong editorial control and meta-moderation of comment ratings actually turned out to be strengths compared to kuro5hin's more democratic and libertarian approach.

This line of thinking leads me to Giles Bowkett very interesting thoughts about social news sites like Slashdot, Digg and Reddit in his post Summon Monsters? Open Door? Heal? Or Die? where he wrote

A funny thing about these sites is that they know about this problem. Hacker News is very concerned about not turning into the next Reddit; Reddit was created as a better Digg; and Digg's corporate mission statement is "at least we're not Slashdot." None of them seem to realize that the order from least to most horrible is identical to the order from youngest to oldest, or that every one of them was good once and isn't any longer.
...
When you build a system where you get points for the number of people who agree with you, you are building a popularity contest for ideas. However, your popularity contest for ideas will not be dominated by the people with the best ideas, but the people with the most time to spend on your web site. Votes appear to be free, like contribution is with Wikipedia, but in reality you have to register to vote, and you have to be there frequently for your votes to make much difference. So the votes aren't really free - they cost time. If you do the math, it's actually quite obvious that if your popularity contest for ideas inherently, by its structure, favors people who waste their own time, then your contest will produce winners which are actually losers. The most popular ideas will not be the best ideas, since the people who have the best ideas, and the ability to recognize them, also have better things to do and better places to be.
Even if you didn't know about the long tail, you'd look for the best ideas on Hacker News (for example) not in its top 10 but in its bottom 1000, because any reasonable person would expect this effect - that people who waste their own time have, in effect, more votes than people who value it - to elevate bad but popular ideas and irretrievably sink independent thinking. And you would be right. TechCrunch is frequently in HN's top ten.

I agree with everything excerpted above except for the implication that all of these sites want to be "better" than their predecessors. I believe that Digg simply wants to be more popular (i.e. garner more page views) than its predecessors and competitors. If the goal of a site is to generate page views then a there is nothing wrong with a popularity contest. However the most popular ideas are hardly ever the best ideas, they are often simply the most palatable to the target audience.

As a user, being popular in such online communities requires two things; being prolific and knowing your audience. If you know your audience, it isn't hard to always generate ideas that will be popular with them. And once you start generating content on a regular basis, you eventually become an authority. This is what happened with MrBabyMan of Digg (and all the other Top Diggers) who has submitted thousands of articles to the site and voted on tens of thousands of articles. This is also what happened with Signal 11 of Slashdot almost a decade ago (damn, I'm getting old). In both the case of MrBabyMan (plus other Top Diggers) and Signal 11, some segment of the user base eventually cottoned on to the fact that participation in a social news site is a game and rallied against the users who are "winning" the game. Similarly in both cases, the managers of the community tried to blunt the rewards of being a high scorer - in Slashdot's case it was with the institution of the karma cap while Digg did it by getting rid of the list of top Diggers.

Although turning participation in your online community into a game complete with points and a high score table is a good tactic to gain an initial set of active users, it does not lead to a healthy or diverse community in the long run. Digg and Slashdot both eventually learned this and have attempted to fix it in their own ways.

Social news sites like Reddit & Digg also have to contend with the fact that the broader their audience gets the less controversial and original their content will be since the goal of such sites is to publish the most broadly popular content on the front page. Additionally, ideas that foster group think will gain in popularity as the culture and audience of the site congeals. Once that occurs, two things will often happen to the site (i) growth will flatten out since there is now a set audience and culture for the site and (ii) the original crop of active users will long for the old days and gripe a lot about how things have changed. This has happened to Slashdot, Kuro5hin, Reddit and every other online community I've watched over time.

This is cycle and fundamental flaw of social news sites will always happen because A Group Is Its Own Worst Enemy.

Now Playing: Notorious B.I.G. - You're Nobody (Til Somebody Kills You)

Categories: Social Software

May 28, 2008

@ 01:57 PM

Comments [1]

Doomed to Repeat History: Will Social Networking Platforms End Up Like Mobile Platforms?

A few months ago Michael Mace, former Chief Competitive Officer and VP of Product Planning at Palm, wrote an insightful and perceptive eulogy for mobile application platforms entitled Mobile applications, RIP where he wrote

Back in 1999 when I joined Palm, it seemed we had the whole mobile ecosystem nailed. The market was literally exploding, with the installed base of devices doubling every year, and an incredible range of creative and useful software popping up all over. In a 22-month period, the number of registered Palm developers increased from 3,000 to over 130,000. The PalmSource conference was swamped, with people spilling out into the halls, and David Pogue took center stage at the close of the conference to tell us how brilliant we all were.
It felt like we were at the leading edge of a revolution, but in hindsight it was more like the high water mark of a flash flood.
...

Two problems have caused a decline the mobile apps business over the last few years. First, the business has become tougher technologically. Second, marketing and sales have also become harder.
From the technical perspective, there are a couple of big issues. One is the proliferation of operating systems. Back in the late 1990s there were two platforms we had to worry about, Pocket PC and Palm OS. Symbian was there too, but it was in Europe and few people here were paying attention. Now there are at least ten platforms. Microsoft alone has several -- two versions of Windows Mobile, Tablet PC, and so on. [Elia didn't mention it, but the fragmentation of Java makes this situation even worse.]
I call it three million platforms with a hundred users each (link).
...
In the mobile world, what have we done? We created a series of elegant technology platforms optimized just for mobile computing. We figured out how to extend battery life, start up the system instantly, conserve precious wireless bandwidth, synchronize to computers all over the planet, and optimize the display of data on a tiny screen.
But we never figured out how to help developers make money. In fact, we paired our elegant platforms with a developer business model so deeply broken that it would take many years, and enormous political battles throughout the industry, to fix it -- if it can ever be fixed at all.

So what does this have to do with social networking sites? The first excerpt from the post where it talks about 130,000 registered developers for the Palm OS sounds a lot like the original hype around the Facebook platform with headlines screaming Facebook Platform Attracts 1000 Developers a Day.

The second excerpt talks about the time there became two big mobile platforms, analogous to the appearance of Google's OpenSocial on the scene as a competing platform used by a consortium of Facebook's competitors. This means that widget developers like Slide and RockYou have to target one set of APIs when building widgets for MySpace, LinkedIn, & Orkut and another completely different set of APIs when building widgets for Facebook & Bebo. Things will likely only get worse. One reason for this is that despite API standardization, all of these sites do not have the same features. Facebook has a Web-based IM, Bebo does not. Orkut has video uploads, LinkedIn does not. All of these differences eventually creep into such APIs as "vendor extensions". The fact that both Google and Facebook are also shipping Open Source implementations of their platforms (Shindig and fbOpen) makes it even more likely that the social networking sites will find ways to extend these platforms to suit their needs.

Finally, there's the show-me-the-money problem. It still isn't clear how one makes money out of building on these social networking platforms. Although companies like Photobucket and Slide have gotten a quarter of a billion and half a billion dollar valuations these have all been typical "Web 2.0" zero profit valuations. This implies that platform developers don't really make money but instead are simply trying to gather a lot of eyeballs then flip to some big company with lots of cash and no ideas. Basically it's a VC funded lottery system. This doesn't sound like the basis of successful and long lived platform such as what we've seen with Microsoft's Windows and Office platforms or Google's Search and Adwords ecosystem. In these platforms there are actually ways for companies to make money by adding value to the ecosystem, this seems more sustainable in the long run than what we have today in the various social networking widget platforms.

It will be interesting to see if the history repeats itself in this instance.

Now Playing: Rick Ross - Luxury Tax (featuring Lil Wayne, Young Jeezy and Trick Daddy)

Categories: Platforms | Social Software

May 28, 2008

@ 01:56 PM

Comments [2]

Not Turtles, AtomPub All the Way Down

Last month Anil Dash commented on recent announcements about Microsoft's strategy around Web protocols in a post entitled Atom Wins: The Unified Cloud Database API where he wrote

AtomPub has become the standard for accessing cloud-based data storage. From Google's GData implementation to Microsoft's unified storage platform that includes SQL Server, Atom is now a consistent interface to build applications that can talk to remote databases, regardless of who provides them. And this milestone has come to pass with almost no notice from the press that covers web APIs and technology.

I don't think the Atom publishing protocol can be considered the universal protocol for talking to remote databases given that cloud storage vendors like Amazon and database vendors like Oracle don't support it yet. That said, this is definitely a positive trend. Back in the RSS vs. Atom days I used to get frustrated that people were spending so much time reinventing the wheel with an RSS clone when the real gaping hole in the infrastructure was a standard editing protocol. It took a little longer than I expected (Sam Ruby started talking about in 2003) but the effort has succeeded way beyond my wildest dreams. All I wanted was a standard editing protocol for blogs and content management systems and we've gotten so much more.

Microsoft is using AtomPub as the interface to a wide breadth of services and products as George Moore points out in his post A Unified Standards-Based Protocols and Tooling Platform for Storage from Microsoft which states

For the first time ever we have a unified protocol and developer tooling story across most of our major storage products from Microsoft:

On-premises structured storage: SQL Server
Cloud-based structured services: SQL Server Data Services
Cloud-based Live storage services: Spaces Photos and Application Data storage

The unified on-the-wire format is Atom, with the unified protocol being AtomPub across all of the above storage products and services. For on-premises access to SQL Server, placing an AtomPub interface on top of your data + business logic is ideal so that you can easily expose that end point to everybody that wants to consume it, whether they are inside or outside your corporate network.

And a few weeks after George's post even more was revealed in posts such as this one about FeedSync and Live Mesh where we find out

Congratulations to the Live Mesh team, who announced their Live Mesh Technology Preview release earlier this evening! Amit Mital gives a detailed overview in this post on http://dev.live.com.
You can read all about it in the usual places...so why do I mention it here? FeedSync is one of the core parts of the Live Mesh platform. One of the key values of Live Mesh is that your data flows to all of your devices. And rather than being hidden away in a single service, any properly authenticated user has full bidirectional sync capability. As I discussed in the Introduction to FeedSync, this really makes "your stuff yours".

Okay, FeedSync isn't really AtomPub but it does use the Atom syndication format so I count that as a win for Atom+APP as well. As time goes on, I hope we'll see even more products and services that support Atom and AtomPub from Microsoft. Standardization at the protocol layer means we can move innovation up the stack. We've seen this happen with HTTP and XML, now hopefully we'll see it happen with Atom + AtomPub.

Finally, I think it's been especially cool that members of the AtomPub community are seeing positive participation from Microsoft, thanks to the folks on the Astoria team who are building ADO.NET Data Services (AtomPub interfaces for interacting with on-site and cloud based SQL Server databases). Kudos to Pablo, Mike Flasko, Andy Conrad and the rest of the Astoria crew.

Now Playing: Mario - Crying Out For Me (remix)

Categories:

May 26, 2008

@ 03:43 PM

Comments [3]

Some Thoughts on Single Instance Storage and Twitter

In response to my recent post about Twitter's architectural woes, Robert Scoble penned a missive entitled Should services charge “super users”? where he writes

First of all, Twitter doesn’t store my Tweets 25,000 times. It stores them once and then it remixes them. This is like saying that Exchange stores each email once for each user. That’s totally not true and shows a lack of understanding how these things work internally.
Second of all, why can FriendFeed keep up with the ever increasing load? I have 10,945 friends on FriendFeed (all added in the past three months, which is MUCH faster growth than Twitter had) and it’s staying up just fine.

My original post excerpted a post by Isreal L'Heureux where he pointed out that there are two naive design choices one could make while designing a system like Twitter. When Robert Scoble posts a tweet you could either

PUSH the message to the queue’s of each of his ~~6,864~~ 24,875 followers, or
Wait for the ~~6,864~~ 24,875 followers to log in, then PULL the message.

Isreal assumed it was #1 which has its set of pros and cons. The pro is that fetching a users inbox is fast and the inbox can easily be cached. The con is that the write I/O is ridiculous. Robert Scoble's post excerpted above argues that Twitter must be using #2 since that is what messaging systems like Microsoft Exchange do.

The feature of Exchange that Robert is talking about is called Single Instance Storage and is described in a Microsoft knowledge base article 175706 as follows

If a message is sent to one recipient, and if the message is copied to 20 other recipients who reside in the same mailbox store, Exchange Server maintains only one copy of the message in its database. Exchange Server then creates pointers.
...
For example, assume the following configuration:
Server 1 Mailbox Store 1: Users A, B, and C
Server 2 Mailbox Store 1: User D
When User A sends the same message to User B, User C, and User D, Exchange Server creates a single instance of the message on server 1 for all three users, because User A, User B, and User C are on the same server. Even the message that is in User A's Sent Items folder is a single instance of the messages that is in the Inboxes of User B and User C. Because User D is on a different server, Exchange Server sends a message to server 2 that will be stored on that server.

So now we know how Exchange's single instance storage works we can talk about how it helps or hinders the model where users pull messages from all the people they are following when they log-in. A pull model assumes that you have a single instance of each message sent by a Twitter user and then when Robert Scoble logs in you execute a query of the form

SELECT TOP 20 messages.author_id, messages.body, messages.publish_date
FROM  messages
WHERE messages.author_id  in (SELECT following.id from following where user_id=$scoble_id)
ORDER BY messages.publish_date DESC;

in this model there's only ever one instance of a message and it lives in the messages table. Where does this break down? Well, what happens when you have more than one database per server or more than one set of database servers or more than one data center? This is why Exchange still has multiple message instances per mailbox store even if they do have single instances shared by all recipients of the message in each mailbox store on a single server.

How much does having multiple databases or multiple DB servers affect the benefits of single instancing of messages? I'll let the Exchange team answer that with their blog post from earlier this year titled Single Instance Storage in Exchange 2007

Over the years, the importance of single instance storage (SIS) in Exchange environments has gradually declined. Ten years ago, Exchange supported about 1000 users per server, with all users on a single database. Today, a typical mailbox server might have 5000 users spread over 50 databases. With one-tenth as many users per database (100 users vs. 1000 users), the potential space savings from single-instancing is reduced by 10x. The move to multiple databases per server, along with the reduction in space savings due to items being deleted over time, means that the space savings from SIS are quite small for most customers. Because of this, we've long recommend that customers ignore SIS when planning their storage requirements for Exchange.
...
Looking forward
Given current trends, we expect the value of single instance storage to continue to decline over time. It's too early for us to say whether SIS will be around in future versions of Exchange. However, we want everyone to understand that it is being deemphasized and should not be a primary factor in today's deployment or migration plans.

At the end of the day, the cost of a naive implementation of single instancing in a situation like Twitter's is latency and ridiculous read I/O. You will have to query data from tens of thousands of users across multiple database every time Robert Scoble logs in.

Exchange gets around the latency problem with the optimization of always having an instance of the message in each database. So you end up with a mix of both a pull & push model. If Scoble blasts a message out to 25,000 recipients on an Exchange system such as the one described in the blog post by the Exchange team, it will end up being written 250 times across 5 servers (100 users per database, 5000 users per server) instead of 25,000 times as would happen in a naive implementation of the PUSH model. Then when each person logs-in to view their message it is assembled on demand from the single instance store local to their mail server instead of trying to read from 250 databases across 5 servers as would happen in a naive implementation of the PULL model.

Scaling a service infrastructure is all about optimizations and trade offs. My original post wasn't meant to imply that Twitter's scaling problems are unsolvable since similar problems have obviously been solved by other online services and Enterprise products in the messaging space. It was meant to inject some perspective in the silly claims that Ruby on Rails is to blame and give readers some perspective on the kind of architectural problems that have to be solved.

Now Playing: Lil Wayne - Got Money (feat. T-Pain)

Categories: Web Development

May 26, 2008

@ 03:42 PM

Comments [2]

Having the Right Users is More Important than Having the Right Features

An important aspect of the growth of a social software application is how well it takes advantage of network effects. Not only should users get value from using the application, they should also get value from the fact that other people use the software as well. Typical examples of technologies that require network effects to even be useful are fax machines and email. Having a fax machine or an email address is useless if no one else has one and they get more useful as more people you communicate with get one. Of course, the opposite is also true and the value of your product declines more rapidly once your user base starts to shrink.

The important balancing act for social is learning how to take as much advantage of the network effects of their application as possible yet make sure the application still provides value even without the presence of network effects. Three examples of Web applications that have done this well are Flickr, MySpace and YouTube. Even without social features, Flickr is a great photo hosting site. However its usage of tagging [which encourages discovery of similar pictures] and social networking [where you get to see your friends photo streams] allows the site to benefit from network effects. Even without a social networking component, MySpace works well as a way for people and corporate brands to represent themselves online in what I'd like to think of as "Geocities 2.0". The same goes for YouTube.

All three of these sites had to face the technology adoption curve and cross the chasm between early adopters & pragmatists/conservatives. One thing that makes social Web sites a lot different from other kinds of technology is that potential customers can not only cheaply try them out but can also can easily see the benefits that others are getting out of the site. This means that it is very important to figure out how to expose the pragmatists and conservative technology users to how much value early adopters are getting from your service. There are still only two ways of really getting this to happen "word of mouth" and explicit advertising. In a world full of "Web 2.0" sites competing for people's attention advertising is not only expensive but often ineffective. Thus word of mouth is king.

So now that we've agreed that word of mouth is important, the next thing to wonder about is how to build it into your product? The truth is that you not only have to create a great product, you also have to create passionate users. Kathy Sierra had two great charts on her site which I've included below

What is important isn't building out your checklist of features, it is making sure you have the right features with the right user experience for your audience. Although that sounds obvious, I've lost count of the amount of churn I've seen in the industry as people react to some announcement by Google, Facebook, or some cool new startup without thinking about how it first into their user experience. When you are constantly adding features to your site without a cohesive vision as to how it all hangs together, your user experience will be crap. Which means you're unlikely to get users to the point where they love your product and are actively evangelizing it for free.

In addition to building a service that turns your users into evangelists, you want to acquire customers that would be effective evangelists especially if they naturally attract an audience. For example, I realized MySpace was going to blow up like a rocket ship full of TNT when every time I went out clubbing the DJ would have the URL of their MySpace page in a prominent place on the stage. I felt similarly about Facebook whenever I heard college students enthusiastically talking about using it as a cool place to hang out online with all their other college friends and alumni. What is interesting to note is how both sites grew from two naturally influential and highly connected demographics to become massively mainstream.

One thing to always remember is that Social Software isn't about features, it is about users. It isn't about one upping the competition in feature checklists it is about capturing the right audience then building an experience that those users can't help evangelizing.

Of course, you need to ensure that when their friends do try out the service they are almost immediately impressed. That is easier said than done. For example, I heard a lot of hype about Xobni and finally got to try it out after hearing lots of enthusiastic evangelism from multiple people. After spending several minutes indexing my inbox I got an app that took up the space of the To-Do sidebar in Outlook with a sidebar full of mostly pointless trivia about my email correspondence. Unfortunately I'd rather know where and when my next meeting is occurring than that I've been on over 1000 emails with Mike Torres. So out went Xobni. Don't let the same happen to your application, as Kathy's graph above says "How soon your users can start kicking ass" directly correlates with how passionate they end up being about your application.

PS: Some people will include viral marketing as a third alternative to traditional advertising and word of mouth. I personally question the effectiveness of this technique which is why I didn't include it above.

Now Playing: Fall Out Boy - I Slept With Someone In Fall Out Boy and All I Got Was This Stupid Song Written About Me

Categories: Social Software

May 23, 2008

@ 03:23 AM

Comments [9]

Some Thoughts on Twitter's Availability Problems

As a regular user of Twitter I've felt the waves of frustration wash over me these past couple of weeks as the service has been hit by one outage after another. This led me to start pondering the problem space [especially as it relates to what I'm currently working on at work] and deduce that the service must have some serious architectural flaws which have nothing to do with the reason usually thrown about by non-technical pundits (i.e. Ruby on Rails is to blame).

Some of my suspicions were confirmed by a recent post on the Twitter developer blog entitled Twittering about architecture which contains the following excerpt

Twitter is, fundamentally, a messaging system. Twitter was not architected as a messaging system, however. For expediency's sake, Twitter was built with technologies and practices that are more appropriate to a content management system. Over the last year and a half we've tried to make our system behave like a messaging system as much as possible, but that's introduced a great deal of complexity and unpredictability. When we're in crisis mode, adding more instrumentation to help us navigate the web of interdependencies in our current architecture is often our primary recourse. This is, clearly, not optimal.

Our direction going forward is to replace our existing system, component-by-component, with parts that are designed from the ground up to meet the requirements that have emerged as Twitter has grown.

Given that Twitter has some unique requirements that would put to test a number of off-the-shelf and even custom messaging applications, it is shocking that it isn't even architected as a messaging system. This makes sense when you consider the background of the founders was a blogging tool and they originally intended Twitter to be a "micro" blogging service.

If Twitter was simply a micro-content publishing tool with push notifications for SMS and IM then the team wouldn't be faulted for designing it as a Content Management System (CMS). In that case you'd just need three data structures

a persistent store for each users tweets
a cache of their tweets in memory to improve read performance
a persistent list of [IM and SMS] end points subscribed to each users tweets and an asynchronous job (i.e. a daemon) which publishes to each users subscribers after each post

Unfortunately, Twitter isn't just a blogging tool that allows people to subscribe to my posts via SMS & IM instead of just RSS. It also has the notion of followers. That's when things get hairy. Isreal over at AssetBar had a great post about this entitled Twitter-proxy: Any Interest? where he wrote

Consider the messaging problem:

Nothing is as easy as it looks. When Robert Scoble writes a simple “I’m hanging out with…” message, Twitter has about two choices of how they can dispatch that message:

PUSH the message to the queue’s of each of his ~~6,864~~ 24,875 followers, or

Wait for the ~~6,864~~ 24,875 followers to log in, then PULL the message.

The trouble with #2 is that people like Robert also follow ~~6,800~~ 21,146 people. And it’s unacceptable for him to login and then have to wait for the system to open records on ~~6,800~~ 21,146 people (across multiple db shards), then sort the records by date and finally render the data. Users would be hating on the HUGE latency.

So, the twitter model is almost certainly #1. Robert’s message is copied (or pre-fetched) to 6,864 users, so when those users open their page/client, Scoble’s message is right there, waiting for them. The users are loving the speed, but Twitter is hating on the writes. All of the writes.

How many writes?

A 6000X multiplication factor:

Do you see a scaling problem with this scenario?

Scoble writes something–boom–~~6,800~~ 21,146 writes are kicked off. 1 for each follower.

Michael Arrington replies–boom–another ~~6,600~~ 17,918 writes.

Jason Calacanis jumps in –boom–another ~~6,500~~ 25,972 writes.

Beyond the ~~19,900~~ 65,036 writes, there’s a lot of additional overhead too. You have to hit a DB to figure out who the ~~19,900~~ 65,036 followers are. Read, read, read. Then possibly hit another DB to find out which shard they live on. Read, read, read. Then you make a connection and write to that DB host, and on success, go back and mark the update as successful. Depending on the details of their messaging system, all the overhead of lookup and accounting could be an even bigger task than the ~~19,900~~ 65,036 reads + ~~19,900~~ 65,036 writes. Do you even want to think about the replication issues (multiply by 2 or 3)? Watch out for locking, too.

And here’s the kicker: that giant processing & delivery effort–possibly a combined ~~100K~~ 130K disk IOs– was caused by 3 users, each just sending one, tiny, 140 char message. How innocent it all seemed.

Not only does Isreal's post accurately describes the problem with the logical model for Twitter's "followers" feature, it looks like he may have also nailed the details of their implementation which would explain the significant issues they've had scaling the site. The problem is that if you naively implement a design that simply reflects the problem statement then you will be in disk I/O hell. It won't matter if you are using Ruby on Rails, Cobol on Cogs, C++ or hand coded assembly, the read & write load will kill you.

This leads me to my new mantra which I've stolen from Jim Gray via Pat Helland; DISK IS THE NEW TAPE.

In addition, the fact that the Twitter folks decided not to cap the number of followers or following may have saved them from the kind of flames that Robert Scoble sends at Facebook for having a 5000 friend limit but it also means that they not only have to deal with supporting users with thousands of followers but also users that follow thousands of users [both of which would be optimized in completely different ways]. Clearly a lot of feature decisions have been made on that product without thought to how they impact the scalability of the service.

PS: If this problem space sounds interesting to you, we're hiring. I'm specifically looking for good operations folks. Shoot me a resume at dareo@msft.com - replace msft.com with microsoft.com

Now Playing: Rick Ross - Maybach Music (featuring Jay-Z)

Categories: Web Development

May 21, 2008

@ 02:21 PM

Comments [12]

Note to Web 2.0 Companies: Early Adopters are not the Mass Market

If you work in the technology industry it pays to be familiar with the ideas from Geoffrey Moore's insightful book Crossing the Chasm. In the book he takes a look at the classic marketing bell curve that segments customers into Early Adopters, Pragmatists, Conservatives and Laggards then points out that there is a large chasm to cross when it comes to becoming popular beyond an initial set of early adopters. There is a good review of his ideas in Eric Sink's blog post entitled Act Your Age which is excerpted below

The people in your market segment are divided into four groups:

Early Adopters are risk takers who actually like to try new things.

Pragmatists might be willing to use new technology, if it's the only way to get their problem solved.

Conservatives dislike new technology and try to avoid it.

Laggards pride themselves on the fact that they are the last to try anything new.

This drawing reflects the fact that there is no smooth or logical transition between the Early Adopters and the Pragmatists. In between the Early Adopters and the Pragmatists there is a chasm. To successfully sell your product to the Pragmatists, you must "cross the chasm".

The knowledge that the needs of early adopters and those of the majority of your potential user base differ significantly is extremely important when building and marketing any technology product. A lot of companies have ended up either building the wrong product or focusing their product too narrowly because they listened too intently to their initial customer base without realizing that they were talking to early adopters.

The fact is that early adopters have different problems and needs from regular users. This is especially true when you compare the demographics of the Silicon Valley early adopter crowd which "Web 2.0" startups often try to court with the typical users of social software on the Web. In the few years I've been working on building Web applications, I've seen a number of technology trends and products that have been heralded as the next big thing by technology pundits which actually never broke into the mainstream because they don't solve the problems of regular Internet users. Here are some examples

Blog Search: A few years ago, blog search engines were all the rage. You had people like Marc Cuban talking up IceRocket and Robert Scoble harranguing Web search companies to build dedicated blog search engines. Since then the products in that space have either given up the ghost (e.g. PubSub, Feedster), turned out to be irrelevant (e.g. Technorati, IceRocket) or were sidelined (e.g. Google Blog Search, Yahoo! Blog Search). The problem with this product category is that except for journalists, marketers and ego surfing A-list bloggers there aren't many people who need a specialized feature set around searching blogs.
Social bookmarking: Although del.icio.us popularized a number of "Web 2.0" trends such as tagging, REST APIs and adding social features to a previously individual task, it has never really taken off as a mainstream product. According to the former VC behind the service it seems to have peaked at 2 million unique visitors last year and is now seeing about half that number of unique users. Compare that to Yahoo! bookmarks which was seeing 20 million active users a year and a half ago.
RSS Readers: I've lost track of all of the this is the year RSS goes mainstream articles I've read over the past few years. Although RSS has turned out to be a key technology which powers a number of interesting functionality behind the scenes (e.g. podcasting) actually subscribing and reading news feeds in an RSS reader has not become a mainstream activity of Web users. When you think about it, it is kind of obvious. The problem an RSS reader solves is "I read so many blogs and news sites on daily basis, I need a tool to help me keep them all straight". How many people who aren't enthusiastic early adopters (i) have this problem and (ii) think they need a tool to deal with it?

These are just the first three that came to mind. I'm sure readers can come up with more examples of their own. This isn't to say that all hyped "Web 2.0" sites haven't lived up to their promise. Flickr is an example of an early adopter hyped site that showed up sprinkled with "Web 2.0" goodness that has become a major part of the daily lives of tens of millions of people across the Web.

When you look at the list of top 50 sites in the U.S. by unique visitors it is interesting to note what common theme unites the recent "Web 2.0" entrants into that list. There are the social networking sites like MySpace and Facebook which harness the natural need of young people to express their individuality yet be part of social cliques. Then there are the sites which provide lots of flexible options that enable people to share their media with their friends, family or the general public such as Flickr and YouTube. Both sites also have figured out how to harness the work of the few to entertain and benefit the many as have Wikipedia and Digg as well. Then there are sites like Fling and AdultFriendFinder which seem to now get more traffic than the personal sites you see advertised on TV for obvious reasons.

However the one overriding theme is that all of these recent entrants is that they solve problems that everyone [or at least a large section of the populace] has. Everyone likes to communicate with their social circle. Everyone likes watching funny videos and looking at couple pics. Everyone wants to find information about topics they interested in or find out what's going on around them. Everybody wants to get laid.

If you are a Web 2.0 company in today's Web you really need to ask yourselves, "Are we solving a problem that everybody has or are we building a product for Robert Scoble?"

Now Playing: Three 6 Mafia - I'd Rather (feat. DJ Unk)

Categories: Social Software | Technology

May 21, 2008

@ 02:20 PM

Comments [10]

C# 3.0 Implicit Type Declarations: To var or not to var?

Recently we got a new contributor on RSS Bandit who uses the ReSharper refactoring tool. This has led to a number of changes to our code base due to suggestions from ReSharper. For the most part, I've considered these changes to be benign until recently. A few weeks ago, I grabbed a recent snap shot of the code from SVN and was surprised to see all the type declarations in method bodies replaced by implicitly typed variable declarations (aka the var keyword). Below is an excerpt of what some of the "refactored" code now look likes.

var selectedIsChild = NodeIsChildOf(TreeSelectedFeedsNode, selectedNode);
var isSmartOrAggregated = (selectedNode.Type == FeedNodeType.Finder ||
                                      selectedNode.Type == FeedNodeType.SmartFolder);
//mark all viewed stories as read 
if (listFeedItems.Items.Count > 0){
   listFeedItems.BeginUpdate();

   for (var i = 0; i < listFeedItems.Items.Count; i++){
       var lvi = listFeedItems.Items[i];
       var newsItem = (INewsItem) lvi.Key;

I found this change to be somewhat puzzling since while it may have shortened the code by a couple of characters on each line but at the cost of making the code less readable. For example, I can't tell what the type of the variable named lvi is just by looking at the code.

I did some searching online to see how the ReSharper team justified this refactoring "suggestion" and came across the blog post entitled ReSharper vs C# 3.0 - Implicitly Typed Locals by a member of the ReSharper team which contains the following excerpt

Some cases where it seems just fine to suggest var are:

New object creation expression: var dictionary = new Dictionary<int, string>();

Cast expression: var element = (IElement)obj;

Safe Cast expression: var element = obj as IElement;

Generic method call with explicit type arguments, when return type is generic: var manager = serviceProvider.GetService<IManager>()

Generic static method call or property with explicit type arguments, when return type is generic: var manager = Singleton<Manager>.Instance;

However, various code styles may need to suggest in other cases. For example, programming in functional style with small methods can benefit from suggesting every suitable local variable to be converted to var style. Or may be your project has IElement root interface and you just know that every variable with "element" name is IElement and you don't want explicit types for this case. Probably, any method with the name GetTreeNode() always return ITreeNode and you want vars for all such local variable.

Currently, we have two suggestions: one that suggests every suitable explicitly typed local to be converted to implicitly typed var, and another that suggests according to rules above.

So it seems there are two suggestion modes. The first suggests using the var keyword when the name of the type is obvious from the right hand side expression being evaluated such as casts or new object creation. The second mode suggests replacing type declarations with the var keyword anywhere the compiler can infer the type [which is pretty much everywhere except for a few edge cases such as when you want to initialize a variable with null at declaration].

The first suggestion mode makes sense to me since the code doesn't lose any clarity and it makes for shorter code. The second mode is the one I find problematic it takes information out of the equation to save a couple of characters per line of code. Each time someone is reading the code, they need to resort to using Go To Definition or Auto-Complete features of their IDE to tell something as simple as "what is the type of this object".

Unfortunately, the ReSharper developers seem to have dug in their heels religiously on this topic as can be seen in the post entitled Varification -- Using Implicitly Typed Locals where a number of arguments are made justifying always using implicitly typed variables including

It induces better naming for local variables.
It induces better API.
It induces variable initialization.
It removes code noise
It doesn't require using directive.

It's interesting how not only are almost all of these "benefits" mainly stylistic but how they contradict each other. For example, the claim that it leads to "better naming for local variables" really means it compels developers to use LONGER HUNGARIAN STYLE VARIABLE NAMES. Funny enough, these long variable names add more noise to the code overall since they show up everywhere the variable is used compared to a single type name showing up when the variable is declared. The argument that it leads to "better API" is another variation of this theme since it argues that if you are compelled to use LONGER MORE DESCRIPTIVE PROPERTY NAMES (e.g. XmlNode.XmlNodeName instead of XmlNode.Name) then this is an improvement. Someone should inform the ReSharper folks that encoding type information in variable names sucks, that's why we're using a strongly typed programming language like C# in the first place.

One more thing, the claim that it encourages variable initialization is weird given that the C# compiler already enforces that. More importantly, the common scenario of initializing a variable to null before it is used isn't supported by the var keyword.

In conclusion, it seems to me that someone on the ReSharper team went overboard in wanting to support the new features in C# 3.0 in their product even though in some cases using these features cause more problems than they solve. Amusingly enough, the C# 3.0 language designers foresaw this problem and put the following note in the C# language reference

Remarks: Overuse of var can make source code less readable for others. It is recommended to use var only when it is necessary, that is, when the variable will be used to store an anonymous type or a collection of anonymous types.

Case closed.

Now Playing: Mariah Carey - Side Effects (featuring Young Jeezy)

Categories: Programming

May 17, 2008

@ 12:52 PM

Comments [3]

Two Key Issues that often Hinder Collaboration Between Teams in Large Companies

I've spent all of my professional career working at a large multinational company. In this time I've been involved in lots of different cross-team and cross-divisional collaboration efforts. Some times these groups were in the same organization and other times you would have to go up five to ten levels up the org chart before you found a shared manager. Surprisingly, the presence or lack of shared management has never been the key factor that has helped or hindered such collaborative efforts.

Of all the problems I've seen when I've had to depend on other teams for help in getting a task accomplished or vice versa; there have been two insidious that tend to crop up in situations where things go awry. The first is misaligned goals. Just because two groups are working together doesn't mean they have the same motivations or expected end results. Things quickly go awry when one group's primary goals either run counter to the goal(s) of the group they are supposed to be collaborating with. For example, consider a company that requires its technical support to have very low average call time to meet their metrics. Imagine that same company also puts together a task force to improve the customer satisfaction with the technical support experience after lots of complaints from their customers. What are the chances that the task force will be able to effect positive change if the metrics used to reward their tech support staff remain the same? The funny thing is that large companies often end up creating groups that are working at cross purposes yet are supposed to be working together.

What makes misaligned goals so insidious is that the members of the collaborating groups who are working through the project often don't realize that the problem is that their goals are misaligned. A lot of the time people tend to think the problem is the other group is evil, a bunch of jerks or just plain selfish. The truth is often that the so-called jerks are really just thinking You're not my manager, so I'm not going to ask how high when you tell me to jump. Once you find out you've hit this problem then the path to solving it is clear. You either have to (i) make sure all collaborating parties want to reach the same outcome and place have similar priorities or (ii) jettison the collaboration effort.

Another problem that has scuttled many a collaboration effort is when one or more of the parties involved has undisclosed concerns about the risks of collaborating which prevents them from entering into the collaboration wholeheartedly or even worse has them actively working against it. Software development teams experience this when they have to manage dependences on their project or that they have on other projects. There's a good paper on the topic entitled Managing Cognitive and Affective Trust in the Conceptual R&D Organization by Diane H. Sonnenwald which breaks down the problem of distrust in conceptual organizations (aka virtual teams) in the following way

Two Types of Trust and Distrust: Cognitive and Affective
Two types of trust, cognitive and affective, have been identified as important in organizations (McAllister, 1995; Rocco, et al, 2001). Cognitive trust focuses on judgments of competence and reliability. Can a co-worker complete a task? Will the results be of sufficient quality? Will the task be completed on time? These are issues that comprise cognitive trust and distrust. The more strongly one believes the answers to these types of questions are affirmative, the stronger one’s cognitive trust. The more strongly one believes the answers to these types of questions are negative, the stronger one’s cognitive distrust.

Affective trust focuses on interpersonal bonds among individuals and institutions, including perceptions of colleagues’ motivation, intentions, ethics and citizenship. Affective trust typically emerges from repeated interactions among individuals, and experiences of reciprocated interpersonal care and concern (Rosseau, et al, 1998). It is also referred to as emotional trust (Rocco, et al, 2001) and relational trust (Rosseau, et al, 1998). It can be “the grease that turns the wheel” (Sonnenwald, 1996).

The issue of affective distrust is strongly related to lacking shared goals while working together as a team which I've already discussed. Cognitive distrust typically results in one or more parties in the collaboration acting with the assumption that the collaboration is going to fail. Since these distrusting group(s) assume failure will be the end result of the collaboration they will take steps to insulate themselves from this failure. However what makes this problem insidious is that the "untrusted" groups are often not formally confronted about the lack of trust in their efforts and thus risk mitigation is not formally built into the collaboration effort. Eventually this leads to behavior that is counterproductive to the collaboration as teams try to mitigate risks in isolation and eventually there is distrust between all parties in the collaboration. Project failure often soon follows.

The best way to prevent this from happening once you find yourself in this situation is to put everyone's concerns on the table. Once the concerns are on the table, be they concerns about product quality, timelines or any of the other myriad issues that impact collaboration, mitigations can be put in place. As the saying goes sunlight is the best disinfectant, thus I've also seen that when the "distrusted" team becomes fully transparent in their workings and information disclosure it quickly makes matters clear. Because one of two things will happen; it will either (i) reassure their dependents that their fears are unfounded or (ii) confirm their concerns in a timely fashion. Either of which is preferable to the status quo.

Now Playing: Mariah Carey - Cruise Control (featuring Damian Marley)

Categories: Life in the B0rg Cube

Dare Obasanjo's weblog

"You can buy cars but you can't buy respect in the hood" - Curtis Jackson

Navigation for Friday, 30 May 2008 - Dare Obasanjo's weblog

Consider the messaging problem:

A 6000X multiplication factor: