I've been having problems with hard drive space for years. For some reason, I couldn't get over the feeling that I had less available space on my hard drive than I could account for. I'd run programs like FolderSizes and after doing some back of the envelope calculations it would seem like I should have gigabytes more free space than what was actually listed as available according to my operating system.

Recently I stumbled on a blog post by Darien Nagle which claimed to answer the question Where's my hard disk space gone? with the recommendation that his readers should try WinDirStat. Seeing nothing to lose I gave it a shot and I definitely came away satisfied. After a quick install, it didn't take long for the application to track down where all those gigabytes of storage I couldn't account for had gone. It seems there was a hidden folder named C:\RECYCLER that was taking up 4 GigaBytes of space.

I thought that was kind of weird so I looked up the folder name and found Microsoft KB 229041 - Files Are Not Deleted From Recycler Folder which listed the following symptoms

SYMPTOMS
When you empty the Recycle Bin in Windows, the files may not be deleted from your hard disk.

NOTE: You cannot view these files using Windows Explorer, My Computer, or the Recycle Bin.

I didn't even have to go through the complicated procedure in the KB article to delete the files, I just deleted them directly from the WinDirStat interface.

My only theory as to how this happened is that some data got orphaned when I upgraded my desktop from Windows XP to Windows 2003 since the user accounts that created them were lost. I guess simply deleting the files from Windows Explorer as I did a few years ago wasn't enough.

Good thing I finally found a solution. I definitely recommend WinDirStat, the visualizations aren't half bad either.

Now Playing: Eminem - Never Enough (feat. 50 Cent & Nate Dogg)


 

Categories: Technology

June 1, 2008
@ 01:46 PM

A coworker forwarded me a story from a Nigerian newspaper about a cat turning into a woman in Port Harcourt, Nigeria. The story is excerpted below

This woman was reported to have earlier been seen as a cat before she reportedly turned into a woman in Port Harcourt, Rivers State, on Thursday. Photo: Bolaji Ogundele. WHAT could be described as a fairy tale turned real on Wednesday in Port Harcourt, Rivers State, as a cat allegedly turned into a middle-aged woman after being hit by a commercial motorcycle (Okada) on Aba/Port Harcourt Expressway.

Nigerian Tribune learnt that three cats were crossing the busy road when the okada ran over one of them which immediately turned into a woman. This strange occurrence quickly attracted people around who descended on the animals. One of them, it was learnt, was able to escape while the third one was beaten to death, still as a cat though.

Another witness, who gave his name as James, said the woman started faking when she saw that many people were gathering around her. “I have never seen anything like this in my life. I saw a woman lying on the road instead of a cat. Blood did not come out of her body at that time. When people gathered and started asking her questions, she pretended that she did not know what had happened," he said.

Reading this reminds me how commonplace it was to read about the kind of mind boggling supernatural stories that you'd expect to see in the Weekly World News in regular newspapers alongside sports, political and stock market news in Nigeria.  Unlike the stories of alien abduction you find in the U.S., the newspaper stories of supernatural events often had witnesses and signed confessions from the alleged perpetrators of supernatural acts. Nobody doubted these stories, everyone knew they were true. Witches who would confess to being behind the run of bad luck of their friends & family or who'd confess that they key to their riches was offering their family members or children as blood sacrifices to ancient gods. It was all stuff I read in the daily papers as a kid as I would be flipping through looking for the comics. 

The current issue of Harper's Bazaar talks about the penis snatching hysteria from my adolescent years. The story is summarized in Slate magazine shown below

Harper's, June 2008
An essay reflects on the widespread reports of "magical penis loss" in Nigeria and Benin, in which sufferers claim their genitals were snatched or shrunken by thieves. Crowds have lynched accused penis thieves in the street. During one 1990 outbreak, "[m]en could be seen in the streets of Lagos holding on to their genitalia either openly or discreetly with their hand in their pockets." Social scientists have yet to identify what causes this mass fear but suspect it is what is referred to as a "culture-bound syndrome," a catchall term for a psychological affliction that affects people within certain ethnic groups.

I remember that time fairly well. I can understand that this sounds like the kind of boogie man stories that fill every culture. In rural America, it is aliens in flying saucers kidnapping people for anal probes and mutilating cows. In Japan, it's the shape changing foxes (Kitsune). In Nigeria, we had witches who snatched penises and could change shape at will.

In the cold light of day it sounds like mass hysteria but I wonder which is easier to believe sometimes. That a bunch of strangers on the street had a mass hallucination that a cat transformed into a woman or that there really are supernatural things beyond modern science's understanding out there? 

Now Playing: Dr. Dre - Natural Born Killaz (feat. Ice Cube)


 

When Google Gears was first announced, it was praised as the final nail in the coffin for desktop applications as it now made it possible to take Web applications offline. However in the past year that Gears has existed, there hasn't been as much progress or enthusiasm in taking applications offline as was initially thought when Gears was announced. There are various reasons for this and since I've already given my thoughts on taking Web applications offline so I won't repeat myself in this post. What I do find interesting is that many proponents of Google Gears including technology evangelists at Google have been gradually switching to pushing Gears as a way to route around browsers and add features to the Web, as opposed to just being about an 'offline solution'. Below are some posts from the past couple of months showing this gradual switch in positioning.

Alex Russell of the Dojo Toolkit wrote a blog post entitled Progress Is N+1 in March of this year that contained the following excerpt

Every browser that we depend on either needs an open development process or it needs to have a public plan for N+1. The goal is to ensure that the market knows that there is momentum and a vehicle+timeline for progress. When that’s not possible or available, it becomes incumbent on us to support alternate schemes to rev the web faster. Google Gears is our best hope for this right now, and at the same time as we’re encouraging browser venders to do the right thing, we should also be championing the small, brave Open Source team that is bringing us a viable Plan B. Every webdev should be building Gear-specific features into their app today, not because Gears is the only way to get something in an app, but because in lieu of a real roadmap from Microsoft, Gears is our best chance at getting great things into the fabric of the web faster. If the IE team doesn’t produce a roadmap, we’ll be staring down a long flush-out cycle to replace it with other browsers. The genius of Gears is that it can augment existing browsers instead of replacing them wholesale. Gears targets the platform/product split and gives us a platform story even when we’re neglected by the browser vendors.

Gears has an open product development process, an auto-upgrade plan, and a plan for N+1.

At this point in the webs evolution, I’m glad to see browser vendors competing and I still feel like that’s our best long-term hope. But we’ve been left at the altar before, and the IE team isn’t giving us lots of reasons to trust them as platform vendors (not as product vendors). For once, we have an open, viable Plan B.

Gears is real, bankable progress.

This was followed up by a post by Dion Almaer who's a technical evangelist at Google who wrote the following in his post Gears as a bleeding-edge HTML 5 implementation

I do not see HTML 5 as competition for Gears at all. I am sitting a few feet away from Hixie’s desk as I write this, and he and the Gears team have a good communication loop.

There is a lot in common between Gears and HTML 5. Both are moving the Web forward, something that we really need to accelerate. Both have APIs to make the Web do new tricks. However HTML 5 is a specification, and Gears is an implementation.
...
Standards bodies are not a place to innovate, else you end up with EJB and the like.
...
Gears is a battle hardened Web update mechanism, that is open source and ready for anyone to join and take in interesting directions.

and what do Web developers actually think about using Google's technology as a way to "upgrade the Web" instead of relying on Web browsers and standards bodies for the next generation of features for the Web? Here's one answer from Matt Mullenweg, founder of WordPress, taken from his post Infrastructure as Competitive Advantage 

When a website “pops” it probably has very little to do with their underlying server infrastructure and a lot to do with the perceived performance largely driven by how it’s coded at the HTML, CSS, and Javascript level. This, incidentally, is one of the reasons Google Gears is going to change the web as we know it today - LocalServer will obsolete CDNs as we know them. (Look for this in WordPress soonish.)

That's a rather bold claim (pun intended) by Matt. If you're wondering what features Matt is adding to WordPress that will depend on Gears, they were recently discussed in Dion Almaer's post Speed Up! with Wordpress and Gears which is excerpted below

WordPress 2.6 and Google Gears

However, Gears is so much more than offline, and it is really exciting to see “Speed Up!” as a link instead of “Go Offline?”

This is just the beginning. As the Gears community fills in the gaps in the Web development model and begins to bring you HTML5 functionality I expect to see less “Go Offline” and more “Speed Up!” and other such phrases. In fact, I will be most excited when I don’t see any such linkage, and the applications are just better.

With an embedded database, local server storage, worker pool execution, desktop APIs, and other exciting modules such as notifications, resumable HTTP being talked about in the community…. I think we can all get excited.

Remember all those rumors back in the day that Google was working on their own browser? Well they've gone one better and are working on the next Flash. Adobe likes pointing out that Flash has more market share than any single browser and we all know that has Flash has gone above and beyond the [X]HTML standards bodies to extend the Web thus powering popular, rich user experiences that weren't possible otherwise (e.g. YouTube). Google is on the road to doing the same thing with Gears. And just like social networks and content sharing sites were a big part in making Flash an integral part of the Web experience for a majority of Web users, Google is learning from history with Gears as can be seen by the the recent announcements from MySpace. I expect we'll soon see Google leverage the popularity of YouTube as another vector to spread Google Gears.  

So far none of the Web sites promoting Google Gears have required it which will limit its uptake. Flash got ahead by being necessary for sites to even work. It will be interesting to see if or when sites move beyond using Gears for nice-to-have features and start requiring it to function. It sounds crazy but I never would have expected to see sites that would be completely broken if Flash wasn't installed five years ago but it isn't surprising today (e.g. YouTube).

PS: For those who are not familiar with the technical details of Google Gears, it currently provides three main bits of functionality; thread pools for asynchronous operations, access to a SQL database running on the user's computer, and access to the user's file system for storing documents, images and other media. There are also beta APIs which provide more access to the user's computer from the browser such as the Desktop API which allows applications to create shortcuts on the user's desktop.  

Now Playing: Nas - It Ain't Hard To Tell


 

Categories: Platforms | Web Development

I started thinking about the problems inherent in social news sites recently due to a roundabout set of circumstances. Jeff Atwood wrote a blog post entitled It's Clay Shirky's Internet, We Just Live In It which linked to a post he made in 2005 titled A Group Is Its Own Worst Enemy which linked to my post on the issues I'd seen with kuro5hin, an early social news site which attempted to "fix" the problems with Slashdot's model but lost its technology focus along the way.

A key fact about online [and offline] communities is that the people who participate the most eventually influence the direction and culture of the community. Kuro5hin tried to fix two problems with Slashdot, the lack of a democratic voting system and the focus on mindless link propagation instead of deeper, more analytical articles. I mentioned how this experiment ended up in my post Jeff linked to which is excerpted below

Now five years later, I still read Slashdot every day but only check K5 out every couple of months out of morbid curiosity. The democracy of K5 caused two things to happen that tended to drive away the original audience. The first was that the focus of the site ended up not being about technology mainly because it is harder for people to write technology articles than write about everyday topics that are nearer and dearer to their hearts. Another was that there was a steady influx of malicious users who eventually drove away a significant proportion of K5's original community, many of whom migrated to HuSi.  This issue is lamented all the time on K5 in comments such as an exercise for rusty and the editors. and You don't understand the nature of what happened.

Besides the malicious users one of the other interesting problems we had on K5 was that the number of people who actually did things like rate comments was very small relative to the number of users on the site. Anytime proposals came up for ways to fix these issues, there would often be someone who disregarded the idea by stating that we were "seeking a technical solution to a social problem". This interaction between technology and social behavior was the first time I really thought about social software. 

The common theme underscoring both problems that hit the site is that they are all due to the cost of participation. It is easier to participate if you are writing about politics during an election year than if you have to write some technical article about the feasibility of adding garbage collection to C++ or analysis of distributed computing technologies. So users followed the path of least resistance. Similarly, cliques of malicious users and trolls have lots of time on their hands by definition and Kuro5hin never found a good way to blunt their influence. Slashdot's system of strong editorial control and meta-moderation of comment ratings actually turned out to be strengths compared to kuro5hin's more democratic and libertarian approach.

This line of thinking leads me to Giles Bowkett very interesting thoughts about social news sites like Slashdot, Digg and Reddit in his post Summon Monsters? Open Door? Heal? Or Die? where he wrote

A funny thing about these sites is that they know about this problem. Hacker News is very concerned about not turning into the next Reddit; Reddit was created as a better Digg; and Digg's corporate mission statement is "at least we're not Slashdot." None of them seem to realize that the order from least to most horrible is identical to the order from youngest to oldest, or that every one of them was good once and isn't any longer.
...
When you build a system where you get points for the number of people who agree with you, you are building a popularity contest for ideas. However, your popularity contest for ideas will not be dominated by the people with the best ideas, but the people with the most time to spend on your web site. Votes appear to be free, like contribution is with Wikipedia, but in reality you have to register to vote, and you have to be there frequently for your votes to make much difference. So the votes aren't really free - they cost time. If you do the math, it's actually quite obvious that if your popularity contest for ideas inherently, by its structure, favors people who waste their own time, then your contest will produce winners which are actually losers. The most popular ideas will not be the best ideas, since the people who have the best ideas, and the ability to recognize them, also have better things to do and better places to be.

Even if you didn't know about the long tail, you'd look for the best ideas on Hacker News (for example) not in its top 10 but in its bottom 1000, because any reasonable person would expect this effect - that people who waste their own time have, in effect, more votes than people who value it - to elevate bad but popular ideas and irretrievably sink independent thinking. And you would be right. TechCrunch is frequently in HN's top ten.

I agree with everything excerpted above except for the implication that all of these sites want to be "better" than their predecessors. I believe that Digg simply wants to be more popular (i.e. garner more page views) than its predecessors and competitors.  If the goal of a site is to generate page views then a there is nothing wrong with a popularity contest. However the most popular ideas are hardly ever the best ideas, they are often simply the most palatable to the target audience.

As a user, being popular in such online communities requires two things; being prolific and knowing your audience. If you know your audience, it isn't hard to always generate ideas that will be popular with them. And once you start generating content on a regular basis, you eventually become an authority. This is what happened with MrBabyMan of Digg (and all the other Top Diggers) who has submitted thousands of articles to the site and voted on tens of thousands of articles.  This is also what happened with Signal 11 of Slashdot almost a decade ago (damn, I'm getting old). In both the case of MrBabyMan (plus other Top Diggers) and Signal 11, some segment of the user base eventually cottoned on to the fact that participation in a social news site is a game and rallied against the users who are "winning" the game. Similarly in both cases, the managers of the community tried to blunt the rewards of being a high scorer - in Slashdot's case it was with the institution of the karma cap while Digg did it by getting rid of the list of top Diggers.

Although turning participation in your online community into a game complete with points and a high score table is a good tactic to gain an initial set of active users, it does not lead to a healthy or diverse community in the long run. Digg and Slashdot both eventually learned this and have attempted to fix it in their own ways.

Social news sites like Reddit & Digg also have to contend with the fact that the broader their audience gets the less controversial and original their content will be since the goal of such sites is to publish the most broadly popular content on the front page. Additionally, ideas that foster group think will gain in popularity as the culture and audience of the site congeals. Once that occurs, two things will often happen to the site (i) growth will flatten out since there is now a set audience and culture for the site and (ii) the original crop of active users will long for the old days and gripe a lot about how things have changed. This has happened to Slashdot, Kuro5hin, Reddit and every other online community I've watched over time.

This is cycle and fundamental flaw of social news sites will always happen because A Group Is Its Own Worst Enemy.

Now Playing: Notorious B.I.G. - You're Nobody (Til Somebody Kills You)


 

Categories: Social Software

A few months ago Michael Mace, former Chief Competitive Officer and VP of Product Planning at Palm, wrote an insightful and perceptive eulogy for mobile application platforms entitled Mobile applications, RIP where he wrote

Back in 1999 when I joined Palm, it seemed we had the whole mobile ecosystem nailed. The market was literally exploding, with the installed base of devices doubling every year, and an incredible range of creative and useful software popping up all over. In a 22-month period, the number of registered Palm developers increased from 3,000 to over 130,000. The PalmSource conference was swamped, with people spilling out into the halls, and David Pogue took center stage at the close of the conference to tell us how brilliant we all were.

It felt like we were at the leading edge of a revolution, but in hindsight it was more like the high water mark of a flash flood.
...

Two problems have caused a decline the mobile apps business over the last few years. First, the business has become tougher technologically. Second, marketing and sales have also become harder.

From the technical perspective, there are a couple of big issues. One is the proliferation of operating systems. Back in the late 1990s there were two platforms we had to worry about, Pocket PC and Palm OS. Symbian was there too, but it was in Europe and few people here were paying attention. Now there are at least ten platforms. Microsoft alone has several -- two versions of Windows Mobile, Tablet PC, and so on. [Elia didn't mention it, but the fragmentation of Java makes this situation even worse.]

I call it three million platforms with a hundred users each (link).  

...
In the mobile world, what have we done? We created a series of elegant technology platforms optimized just for mobile computing. We figured out how to extend battery life, start up the system instantly, conserve precious wireless bandwidth, synchronize to computers all over the planet, and optimize the display of data on a tiny screen.

But we never figured out how to help developers make money. In fact, we paired our elegant platforms with a developer business model so deeply broken that it would take many years, and enormous political battles throughout the industry, to fix it -- if it can ever be fixed at all.

So what does this have to do with social networking sites? The first excerpt from the post where it talks about 130,000 registered developers for the Palm OS sounds a lot like the original hype around the Facebook platform with headlines screaming Facebook Platform Attracts 1000 Developers a Day.

The second excerpt talks about the time there became two big mobile platforms, analogous to the appearance of Google's OpenSocial on the scene as a competing platform used by a consortium of Facebook's competitors. This means that widget developers like Slide and RockYou have to target one set of APIs when building widgets for MySpace, LinkedIn, & Orkut and another completely different set of APIs when building widgets for Facebook & Bebo. Things will likely only get worse. One reason for this is that despite API standardization, all of these sites do not have the same features. Facebook has a Web-based IM, Bebo does not. Orkut has video uploads, LinkedIn does not. All of these differences eventually creep into such APIs as "vendor extensions". The fact that both Google and Facebook are also shipping Open Source implementations of their platforms (Shindig and fbOpen) makes it even more likely that the social networking sites will find ways to extend these platforms to suit their needs.

Finally, there's the show-me-the-money problem. It still isn't clear how one makes money out of building on these social networking platforms. Although companies like Photobucket and Slide have gotten a quarter of a billion and half a billion dollar valuations these have all been typical "Web 2.0" zero profit valuations. This implies that platform developers don't really make money but instead are simply trying to gather a lot of eyeballs then flip to some big company with lots of cash and no ideas. Basically it's a VC funded lottery system. This doesn't sound like the basis of successful and long lived platform such as what we've seen with Microsoft's Windows and Office platforms or Google's Search and Adwords ecosystem. In these platforms there are actually ways for companies to make money by adding value to the ecosystem, this seems more sustainable in the long run than what we have today in the various social networking widget platforms.

It will be interesting to see if the history repeats itself in this instance.

Now Playing: Rick Ross - Luxury Tax (featuring Lil Wayne, Young Jeezy and Trick Daddy)


 

Categories: Platforms | Social Software

Last month Anil Dash commented on recent announcements about Microsoft's strategy around Web protocols in a post entitled Atom Wins: The Unified Cloud Database API where he wrote

AtomPub has become the standard for accessing cloud-based data storage. From Google's GData implementation to Microsoft's unified storage platform that includes SQL Server, Atom is now a consistent interface to build applications that can talk to remote databases, regardless of who provides them. And this milestone has come to pass with almost no notice from the press that covers web APIs and technology.

I don't think the Atom publishing protocol can be considered the universal protocol for talking to remote databases given that cloud storage vendors like Amazon and database vendors like Oracle don't support it yet. That said, this is definitely a positive trend. Back in the RSS vs. Atom days I used to get frustrated that people were spending so much time reinventing the wheel with an RSS clone when the real gaping hole in the infrastructure was a standard editing protocol. It took a little longer than I expected (Sam Ruby started talking about in 2003) but the effort has succeeded way beyond my wildest dreams. All I wanted was a standard editing protocol for blogs and content management systems and we've gotten so much more.

Microsoft is using AtomPub as the interface to a wide breadth of services and products as George Moore points out in his post A Unified Standards-Based Protocols and Tooling Platform for Storage from Microsoft  which states

For the first time ever we have a unified protocol and developer tooling story across most of our major storage products from Microsoft:

The unified on-the-wire format is Atom, with the unified protocol being AtomPub across all of the above storage products and services. For on-premises access to SQL Server, placing an AtomPub interface on top of your data + business logic is ideal so that you can easily expose that end point to everybody that wants to consume it, whether they are inside or outside your corporate network.

And a few weeks after George's post even more was revealed in posts such as this one about  FeedSync and Live Mesh where we find out

Congratulations to the Live Mesh team, who announced their Live Mesh Technology Preview release earlier this evening! Amit Mital gives a detailed overview in this post on http://dev.live.com.

You can read all about it in the usual places...so why do I mention it here? FeedSync is one of the core parts of the Live Mesh platform. One of the key values of Live Mesh is that your data flows to all of your devices. And rather than being hidden away in a single service, any properly authenticated user has full bidirectional sync capability. As I discussed in the Introduction to FeedSync, this really makes "your stuff yours".

Okay, FeedSync isn't really AtomPub but it does use the Atom syndication format so I count that as a win for Atom+APP as well. As time goes on, I hope we'll see even more products and services that support Atom and AtomPub from Microsoft. Standardization at the protocol layer means we can move innovation up the stack. We've seen this happen with HTTP and XML, now hopefully we'll see it happen with Atom + AtomPub.

Finally, I  think it's been especially cool that members of the AtomPub community are seeing positive participation from Microsoft, thanks to the folks on the Astoria team who are building ADO.NET Data Services (AtomPub interfaces for interacting with on-site and cloud based SQL Server databases). Kudos to Pablo, Mike Flasko, Andy Conrad and the rest of the Astoria crew.

Now Playing: Mario - Crying Out For Me (remix)


 

Categories:

In response to my recent post about Twitter's architectural woes, Robert Scoble penned a missive entitled Should services charge “super users”? where he writes

First of all, Twitter doesn’t store my Tweets 25,000 times. It stores them once and then it remixes them. This is like saying that Exchange stores each email once for each user. That’s totally not true and shows a lack of understanding how these things work internally.

Second of all, why can FriendFeed keep up with the ever increasing load? I have 10,945 friends on FriendFeed (all added in the past three months, which is MUCH faster growth than Twitter had) and it’s staying up just fine.

My original post excerpted a post by Isreal L'Heureux where he pointed out that there are two naive design choices one could make while designing a system like Twitter. When Robert Scoble posts a tweet you could either

    1. PUSH the message to the queue’s of each of his 6,864 24,875 followers, or
    2. Wait for the 6,864 24,875 followers to log in, then PULL the message.

Isreal assumed it was #1 which has its set of pros and cons. The pro is that fetching a users inbox is fast and the inbox can easily be cached. The con is that the write I/O is ridiculous. Robert Scoble's post excerpted above argues that Twitter must be using #2 since that is what messaging systems like Microsoft Exchange do.

The feature of Exchange that Robert is talking about is called Single Instance Storage and is described in a Microsoft knowledge base article 175706 as follows

If a message is sent to one recipient, and if the message is copied to 20 other recipients who reside in the same mailbox store, Exchange Server maintains only one copy of the message in its database. Exchange Server then creates pointers.
...
For example, assume the following configuration:
Server 1 Mailbox Store 1: Users A, B, and C
Server 2 Mailbox Store 1: User D

When User A sends the same message to User B, User C, and User D, Exchange Server creates a single instance of the message on server 1 for all three users, because User A, User B, and User C are on the same server. Even the message that is in User A's Sent Items folder is a single instance of the messages that is in the Inboxes of User B and User C. Because User D is on a different server, Exchange Server sends a message to server 2 that will be stored on that server.

So now we know how Exchange's single instance storage works we can talk about how it helps or hinders the model where users pull messages from all the people they are following when they log-in. A pull model assumes that you have a single instance of each message sent by a Twitter user and then when Robert Scoble logs in you execute a query of the form

SELECT TOP 20 messages.author_id, messages.body, messages.publish_date
FROM  messages
WHERE messages.author_id  in (SELECT following.id from following where user_id=$scoble_id)
ORDER BY messages.publish_date DESC;

in this model there's only ever one instance of a message and it lives in the messages table. Where does this break down? Well, what happens when you have more than one database per server or more than one set of database servers or more than one data center?  This is why Exchange still has multiple message instances per mailbox store even if they do have single instances shared by all recipients of the message in each mailbox store on a single server.

How much does having multiple databases or multiple DB servers affect the benefits of single instancing of messages? I'll let the Exchange team answer that with their blog post from earlier this year titled Single Instance Storage in Exchange 2007 

Over the years, the importance of single instance storage (SIS) in Exchange environments has gradually declined. Ten years ago, Exchange supported about 1000 users per server, with all users on a single database. Today, a typical mailbox server might have 5000 users spread over 50 databases. With one-tenth as many users per database (100 users vs. 1000 users), the potential space savings from single-instancing is reduced by 10x. The move to multiple databases per server, along with the reduction in space savings due to items being deleted over time, means that the space savings from SIS are quite small for most customers. Because of this, we've long recommend that customers ignore SIS when planning their storage requirements for Exchange.
...
Looking forward
Given current trends, we expect the value of single instance storage to continue to decline over time. It's too early for us to say whether SIS will be around in future versions of Exchange. However, we want everyone to understand that it is being deemphasized and should not be a primary factor in today's deployment or migration plans.

At the end of the day, the cost of a naive implementation of single instancing in a situation like Twitter's is latency and ridiculous read I/O. You will have to query data from tens of thousands of users across multiple database every time Robert Scoble logs in.

Exchange gets around the latency problem with the optimization of always having an instance of the message in each database. So you end up with a mix of both a pull & push model. If Scoble blasts a message out to 25,000 recipients on an Exchange system such as the one described in the blog post by the Exchange team, it will end up being written 250 times across 5 servers (100 users per database, 5000 users per server) instead of 25,000 times as would happen in a naive implementation of the PUSH model. Then when each person logs-in to view their message it is assembled on demand from the single instance store local to their mail server instead of trying to read from 250 databases across 5 servers as would happen in a naive implementation of the PULL model.

Scaling a service infrastructure is all about optimizations and trade offs. My original post wasn't meant to imply that Twitter's scaling problems are unsolvable since similar problems have obviously been solved by other online services and Enterprise products in the messaging space. It was meant to inject some perspective in the silly claims that Ruby on Rails is to blame and give readers some perspective on the kind of architectural problems that have to be solved.

Now Playing: Lil Wayne - Got Money (feat. T-Pain)


 

Categories: Web Development

An important aspect of the growth of a social software application is how well it takes advantage of network effects. Not only should users get value from using the application, they should also get value from the fact that other people use the software as well. Typical examples of technologies that require network effects to even be useful are fax machines and email. Having a fax machine or an email address is useless if no one else has one and they get more useful as more people you communicate with get one. Of course, the opposite is also true and the value of your product declines more rapidly once your user base starts to shrink.

The important balancing act for social is learning how to take as much advantage of the network effects of their application as possible yet make sure the application still provides value even without the presence of network effects. Three examples of Web applications that have done this well are Flickr, MySpace and YouTube. Even without social features, Flickr is a great photo hosting site. However its usage of tagging [which encourages discovery of similar pictures] and social networking [where you get to see your friends photo streams] allows the site to benefit from network effects. Even without a social networking component, MySpace works well as a way for people and corporate brands to represent themselves online in what I'd like to think of as "Geocities 2.0". The same goes for YouTube.

All three of these sites had to face the technology adoption curve and cross the chasm between early adopters & pragmatists/conservatives. One thing that makes social Web sites a lot different from other kinds of technology is that potential customers can not only cheaply try them out but can also can easily see the benefits that others are getting out of the site. This means that it is very important to figure out how to expose the pragmatists and conservative technology users to how much value early adopters are getting from your service. There are still only two ways of really getting this to happen "word of mouth" and explicit advertising. In a world full of "Web 2.0" sites competing for people's attention advertising is not only expensive but often ineffective. Thus word of mouth is king.

So now that we've agreed that word of mouth is important, the next thing to wonder about is how to build it into your product? The truth is that you not only have to create a great product, you also have to create passionate users. Kathy Sierra had two great charts on her site which I've included below

 

What is important isn't building out your checklist of features, it is making sure you have the right features with the right user experience for your audience.  Although that sounds obvious, I've lost count of the amount of churn I've seen in the industry as people react to some announcement by Google, Facebook, or some cool new startup without thinking about how it first into their user experience. When you are constantly adding features to your site without a cohesive vision as to how it all hangs together, your user experience will be crap. Which means you're unlikely to get users to the point where they love your product and are actively evangelizing it for free.

In addition to building a service that turns your users into evangelists, you want to acquire customers that would be effective evangelists especially if they naturally attract an audience.  For example, I realized MySpace was going to blow up like a rocket ship full of TNT when every time I went out clubbing the DJ would have the URL of their MySpace page in a prominent place on the stage. I felt similarly about Facebook whenever I heard college students enthusiastically  talking about using it as a cool place to hang out online with all their other college friends and alumni. What is interesting to note is how both sites grew from two naturally influential and highly connected demographics to become massively mainstream.

One thing to always remember is that Social Software isn't about features, it is about users. It isn't about one upping the competition in feature checklists it is about capturing the right audience then building an experience that those users can't help evangelizing.

Of course, you need to  ensure that when their friends do try out the service they are almost immediately impressed. That is easier said than done. For example, I heard a lot of hype about Xobni and finally got to try it out after hearing lots of enthusiastic evangelism from multiple people. After spending several minutes indexing my inbox I got an app that took up the space of the To-Do sidebar in Outlook with a sidebar full of mostly pointless trivia about my email correspondence. Unfortunately I'd rather know where and when my next meeting is occurring than that I've been on over 1000 emails with Mike Torres. So out went Xobni. Don't let the same happen to your application, as Kathy's graph above says "How soon your users can start kicking ass" directly correlates with how passionate they end up being about your application.

PS: Some people will include viral marketing as a third alternative to traditional advertising and word of mouth. I personally question the effectiveness of this technique which is why I didn't include it above.  

Now Playing: Fall Out Boy - I Slept With Someone In Fall Out Boy and All I Got Was This Stupid Song Written About Me


 

Categories: Social Software

As a regular user of Twitter I've felt the waves of frustration wash over me these past couple of weeks as the service has been hit by one outage after another. This led me to start pondering the problem space [especially as it relates to what I'm currently working on at work] and deduce that the service must have some serious architectural flaws which have nothing to do with the reason usually thrown about by non-technical pundits (i.e. Ruby on Rails is to blame).

Some of my suspicions were confirmed by a recent post on the Twitter developer blog entitled  Twittering about architecture which contains the following excerpt

Twitter is, fundamentally, a messaging system. Twitter was not architected as a messaging system, however. For expediency's sake, Twitter was built with technologies and practices that are more appropriate to a content management system. Over the last year and a half we've tried to make our system behave like a messaging system as much as possible, but that's introduced a great deal of complexity and unpredictability. When we're in crisis mode, adding more instrumentation to help us navigate the web of interdependencies in our current architecture is often our primary recourse. This is, clearly, not optimal.

Our direction going forward is to replace our existing system, component-by-component, with parts that are designed from the ground up to meet the requirements that have emerged as Twitter has grown.

Given that Twitter has some unique requirements that would put to test a number of off-the-shelf and even custom messaging applications, it is shocking that it isn't even architected as a messaging system. This makes sense when you consider the background of the founders was a blogging tool and they originally intended Twitter to be a "micro" blogging service.

If Twitter was simply a micro-content publishing tool with push notifications for SMS and IM then the team wouldn't be faulted for designing it as a Content Management System (CMS). In that case you'd just need three data structures

  • a persistent store for each users tweets
  • a cache of their tweets in memory to improve read performance
  • a persistent list of [IM and SMS] end points subscribed to each users tweets and an asynchronous job (i.e. a daemon) which publishes to each users subscribers after each post

Unfortunately, Twitter isn't just a blogging tool that allows people to subscribe to my posts via SMS & IM instead of just RSS. It also has the notion of followers. That's when things get hairy. Isreal over at AssetBar had a great post about this entitled Twitter-proxy: Any Interest? where he wrote

Consider the messaging problem:

Nothing is as easy as it looks. When Robert Scoble writes a simple “I’m hanging out with…” message, Twitter has about two choices of how they can dispatch that message:

  1. PUSH the message to the queue’s of each of his 6,864 24,875 followers, or
  2. Wait for the 6,864 24,875 followers to log in, then PULL the message.

The trouble with #2 is that people like Robert also follow 6,800 21,146 people. And it’s unacceptable for him to login and then have to wait for the system to open records on 6,800 21,146 people (across multiple db shards), then sort the records by date and finally render the data. Users would be hating on the HUGE latency.

So, the twitter model is almost certainly #1. Robert’s message is copied (or pre-fetched) to 6,864 users, so when those users open their page/client, Scoble’s message is right there, waiting for them. The users are loving the speed, but Twitter is hating on the writes. All of the writes.

How many writes?

A 6000X multiplication factor:

Do you see a scaling problem with this scenario?

Scoble writes something–boom–6,800 21,146 writes are kicked off. 1 for each follower.

Michael Arrington replies–boom–another 6,600 17,918 writes.

Jason Calacanis jumps in –boom–another 6,500 25,972 writes.

Beyond the 19,900 65,036 writes, there’s a lot of additional overhead too. You have to hit a DB to figure out who the 19,900 65,036 followers are. Read, read, read. Then possibly hit another DB to find out which shard they live on. Read, read, read. Then you make a connection and write to that DB host, and on success, go back and mark the update as successful. Depending on the details of their messaging system, all the overhead of lookup and accounting could be an even bigger task than the 19,900 65,036 reads + 19,900 65,036 writes. Do you even want to think about the replication issues (multiply by 2 or 3)? Watch out for locking, too.

And here’s the kicker: that giant processing & delivery effort–possibly a combined 100K 130K disk IOs– was caused by 3 users, each just sending one, tiny, 140 char message. How innocent it all seemed.

Not only does Isreal's post accurately describes the problem with the logical model for Twitter's "followers" feature, it looks like he may have also nailed the details of their implementation which would explain the significant issues they've had scaling the site. The problem is that if you naively implement a design that simply reflects the problem statement then you will be in disk I/O hell. It won't matter if you are using Ruby on Rails, Cobol on Cogs, C++ or hand coded assembly, the read & write load will kill you.

This leads me to my new mantra which I've stolen from Jim Gray via Pat Helland; DISK IS THE NEW TAPE.

In addition, the fact that the Twitter folks decided not to cap the number of followers or following may have saved them from the kind of flames that Robert Scoble sends at Facebook for having a 5000 friend limit but it also means that they not only have to deal with supporting users with thousands of followers but also users that follow thousands of users [both of which would be optimized in completely different ways].  Clearly a lot of feature decisions have been made on that product without thought to how they impact the scalability of the service.

PS: If this problem space sounds interesting to you, we're hiring. I'm specifically looking for good operations folks. Shoot me a resume at dareo@msft.com - replace msft.com with microsoft.com

Now Playing: Rick Ross - Maybach Music (featuring Jay-Z)


 

Categories: Web Development

If you work in the technology industry it pays to be familiar with the ideas from Geoffrey Moore's insightful book Crossing the Chasm. In the book he takes a look at the classic marketing bell curve that segments customers into Early Adopters, Pragmatists, Conservatives and Laggards then points out that there is a large chasm to cross when it comes to becoming popular beyond an initial set of early adopters. There is a good review of his ideas in Eric Sink's blog post entitled Act Your Age which is excerpted below

The people in your market segment are divided into four groups:

Early Adopters are risk takers who actually like to try new things.

Pragmatists might be willing to use new technology, if it's the only way to get their problem solved.

Conservatives dislike new technology and try to avoid it.

Laggards pride themselves on the fact that they are the last to try anything new.

This drawing reflects the fact that there is no smooth or logical transition between the Early Adopters and the Pragmatists.  In between the Early Adopters and the Pragmatists there is a chasm.  To successfully sell your product to the Pragmatists, you must "cross the chasm". 

The knowledge that the needs of early adopters and those of the majority of your potential user base differ significantly is extremely important when building and marketing any technology product. A lot of companies have ended up either building the wrong product or focusing their product too narrowly because they listened too intently to their initial customer base without realizing that they were talking to early adopters.

The fact is that early adopters have different problems and needs from regular users. This is especially true when you compare the demographics of the Silicon Valley early adopter crowd which "Web 2.0" startups often try to court with the typical users of social software on the Web.  In the few years I've been working on building Web applications, I've seen a number of technology trends and products that have been heralded as the next big thing by technology pundits which actually never broke into the  mainstream because they don't solve the problems of regular Internet users. Here are some examples

  • Blog Search: A few years ago, blog search engines were all the rage. You had people like Marc Cuban talking up IceRocket and Robert Scoble harranguing Web search companies to build dedicated blog search engines. Since then the products in that space have either given up the ghost (e.g. PubSub, Feedster), turned out to be irrelevant (e.g. Technorati, IceRocket) or were sidelined (e.g. Google Blog Search, Yahoo! Blog Search). The problem with this product category is that except for journalists, marketers and ego surfing A-list bloggers there aren't many people who need a specialized feature set around searching blogs.  

  • Social bookmarking: Although del.icio.us popularized a number of "Web 2.0" trends such as tagging, REST APIs and adding social features to a previously individual task, it has never really taken off as a mainstream product. According to the former VC behind the service it seems to have peaked at 2 million unique visitors last year and is now seeing about half that number of unique users. Compare that to Yahoo! bookmarks which was seeing 20 million active users a year and a half ago.

  • RSS Readers: I've lost track of all of the this is the year RSS goes mainstream articles I've read over the past few years. Although RSS has turned out to be a key technology which powers a number of interesting functionality behind the scenes (e.g. podcasting) actually subscribing and reading news feeds in an RSS reader has not become a mainstream activity of Web users. When you think about it, it is kind of obvious. The problem an RSS reader solves is "I read so many blogs and news sites on daily basis, I need a tool to help me keep them all straight". How many people who aren't enthusiastic early adopters (i) have this problem and (ii) think they need a tool to deal with it?

These are just the first three that came to mind. I'm sure readers can come up with more examples of their own. This isn't to say that all hyped "Web 2.0" sites haven't lived up to their promise. Flickr is an example of an early adopter hyped site that showed up sprinkled with "Web 2.0" goodness that has become a major part of the daily lives of tens of millions of people across the Web.

When you look at the list of top 50 sites in the U.S. by unique visitors it is interesting to note what common theme unites the recent "Web 2.0" entrants into that list. There are the social networking sites like MySpace and Facebook which harness the natural need of young people to express their individuality yet be part of social cliques.  Then there are the sites which provide lots of flexible options that enable people to share their media with their friends, family or the general public such as Flickr and YouTube. Both sites also have figured out how to harness the work of the few to entertain and benefit the many as have Wikipedia and Digg as well. Then there are sites like Fling and AdultFriendFinder which seem to now get more traffic than the personal sites you see advertised on TV for obvious reasons.

However the one overriding theme is that all of these recent entrants is that they solve problems that everyone [or at least a large section of the populace] has. Everyone likes to communicate with their social circle. Everyone likes watching funny videos and looking at couple pics. Everyone wants to find information about topics they interested in or find out what's going on around them. Everybody wants to get laid.

If you are a Web 2.0 company in today's Web you really need to ask yourselves, "Are we solving a problem that everybody has or are we building a product for Robert Scoble?"

Now Playing: Three 6 Mafia - I'd Rather (feat. DJ Unk)


 

Categories: Social Software | Technology