Marc Andreessen (whose blog is on fire!) has a rather lengthy but excellent blog post entitled The Pmarca Guide to Big Companies, part 2: Retaining great people which has some good advice on how big companies can retain their best employees. The most interesting aspects of his post were some of the accurate observations he had about obviously bad ideas that big companies implement which are intended to retain their best employees but end up backfiring. I thought these insights were valuable enough that they are worth repeating.

Marc writes

Don't create a new group or organization within your company whose job is "innovation". This takes various forms, but it happens reasonably often when a big company gets into product trouble, and it's hugely damaging.

Here's why:

First, you send the terrible message to the rest of the organization that they're not supposed to innovate.

Second, you send the terrible message to the rest of the organization that you think they're the B team.

That's a one-two punch that will seriously screw things up.

This so true. Every time I've seen some executive or management higher up create an incubation or innovation team within a specific product group, it has lead to demoralization of the people who have been relegated as the "B team" and bad blood between both teams which eventually leads to in-fighting. All of this might be worth it if these efforts are successful but as Clayton Christensen pointed out in his interview in Business Week on the tenth anniversary of "The Innovator's Dilemma"

People come up with lots of new ideas, but nothing happens. They get very disillusioned. Never does an idea pop out of a person's head as a completely fleshed-out business plan. It has to go through a process that will get approved and funded. You're not two weeks into the process until you realize, "gosh, the sales force is not going to sell this thing," and you change the economics. Then two weeks later, marketing says they won't support it because it doesn't fit the brand, so we've got to change the whole concept.

All those forces act to make the idea conform to the company's existing business model, not to the marketplace. And that's the rub. So the senior managers today, thirsty for innovation, stand at the outlet of this pipe, see the dribbling out of me-too innovation after me-too innovation, and they scream up to the back end, "Hey, you guys, get more innovative! We need more and better innovative ideas!" But that's not the problem. The problem is this shaping process that conforms all these innovative ideas to the current business model of the company.

This is something I've seen happen time after time. There are times when incubation/innovation teams produce worthwhile results but they are few and far between especially compared to the number of them that exist. In addition, even in those cases both of Marc's observations were still correct and they led to in-fighting between the teams which damaged the overall health of the product, the people and the organization. 

Marc also wrote

Don't do arbitrary large spot bonuses or restricted stock grants to try to give a small number of people huge financial upside.

An example is the Google Founders' Awards program, which Google has largely stopped, and which didn't work anyway.

It sounds like a great idea at the time, but it causes a severe backlash among both the normal people who don't get it (who feel like they're the B team) and the great people who don't get it (who feel like they've been screwed).

Significantly differentiated financial rewards for your "best employees"  are a seductive idea for executives but they rarely work as planned for several reasons. One reason is based on an observation I first saw in Paul Graham's essay Hiring is Obsolete; big companies don't know how to value the contributions of individual employees. Robert Scoble often used to complain in the comments to his blog that he made less than six figures at Microsoft. I personally think he did more for the company's image than the millions we've spent on high priced public relations and advertising firms. Yet it is incredibly difficult to prove this and even if one could the process wouldn't scale to every single employee. Then there's all the research from various corporations that have used social network analysis to find out that their most valuable employees are rarely the ones that are high up in the org chart (see How Org Charts Lie published by the Harvard Business School). The second reason significantly financially rewarding your "best employees" ends up being problematic is well described in Joel Spolsky's article Incentive Pay Considered Harmful where he points out

Most people think that they do pretty good work (even if they don't). It's just a little trick our minds play on us to keep life bearable. So if everybody thinks they do good work, and the reviews are merely correct (which is not very easy to achieve), then most people will be disappointed by their reviews

When you combine the above observation with the act if rewarding does that get good reviews disproportionately from those that just did OK, it can lead to problems. For example, what happens when a company decides that it will give millions of dollars in bonuses to its employees if they "add the most value" to the company? Hey, isn't that what the Google Founder's Awards were supposed to be about...how did that turn out?

The company has continually tinkered with its incentives for people to stay. Early on Page and Brin gave "Founders' Awards" in cash to people who made significant contributions. The handful of employees who pulled off the unusual Dutch auction public offering in August 2004 shared $10 million. The idea was to replicate the windfall rewards of a startup, but it backfired because those who didn't get them felt overlooked. "It ended up pissing way more people off," says one veteran.

Google rarely gives Founders' Awards now, preferring to dole out smaller executive awards, often augmented by in- person visits by Page and Brin. "We are still trying to capture the energy of a startup," says Bock.

Another seductive idea that sounds good on paper which falls apart when you actually add human beings to the equation.

 

In my previous post, I mentioned that I'm in the early stages of building an application on the Facebook platform. I haven't yet decided on an application but for now, let's assume that it is a Favorite Comic Books application which allows me to store my favorite comic books and shows me to most popular comic books among my friends.

After investigating using Amazon's EC2 + S3 to build my application I've decided that I'm better off using a traditional hosting solution running either a on the LAMP or WISC platform. One of the things I've been looking at is which platform has better support for providing an in-memory caching solution that works well in the context of a Web farm (i.e. multiple Web servers) out of the box. While working on the platforms behind several high traffic Windows Live services I've learned  that you should be prepared for dealing with scalability issues and caching is one of the best ways to get bang for the buck when improving the scalability of your service.

I recently discovered memcached which is a distributed, object caching system originally developed by Brad Fitzpatrick of LiveJournal fame. You can think of memcached as a giant hash table that can run on multiple servers which automatically handles maintaining the balance of objects hashed to each server and transparently fetches/removes objects from over the network if they aren't on the same machine that is accessing an object in the hash table. Although this sounds fairly simple, there is a lot of grunt work in building a distributed object cache which handles data partitioning across multiple servers and hides the distributed nature of the application from the developer. memcached is a well integrated into the typical LAMP stack and is used by a surprising number of high traffic websites including Slashdot, Facebook, Digg, Flickr and Wikipedia. Below is what C# code that utilizes memcached would look like sans exception handling code

public ArrayList GetFriends(int user_id){

    ArrayList friends = (ArrayList) myCache.Get("friendslist:" + userid);

    if(friends == null){
        // Open the connection
        dbConnection.Open();

        SqlCommand cmd = new SqlCommand("select friend_id from friends_list where owner_id=" + "user_id", dbConnection);

        SqlDataReader reader = cmd.ExecuteReader();

        // Add each friend ID to the list
        while (reader.Read()){
            friends.Add(rdr[0]);
        }

	reader.Close();
       dbConnection.Close();                 

        myCache.Set("friendslist:" + userid, friends);
    }

    return friends;
}

public void AddFriend(int user_id, int new_friends_id){

    // Open the connection
    dbConnection.Open();

    SqlCommand cmd = new SqlCommand("insert into friends_list (owner_id, friend_id) values (" + user_id + "," + new_friend_id ")";
   cmd.ExecuteNonQuery();

    //remove key from cache since friends list has been updated
    myCache.Delete("friendslist:" + userid);

    dbConnection .Close(); 
}

The benefits of the using of the cache should be pretty obvious. I no longer need to hit the database after the first request to retrieve the user's friend list which means faster performance in servicing the request and less I/O.  The memcached automatically handles purging items out of the cache when it hits the size limit and also deciding which cache servers should hold individual key<->value pairs.

I hang with a number of Web developers on the WISC platform and I don't think I've ever heard anyone mention memcached or anything like it.In fact I couldn't find a mention of it on Microsoft employee blogs, ASP.NET developer blogs or on MSDN. So I wondered what the average WISC developer uses as their in-memory caching solution.

After looking around a bit, I came to the conclusion that most WISC developers use the built-in ASP.NET caching features. ASP.NET provides a number of in-memory caching features including a Cache class which provides a similar API to memcached, page directives for caching portions of the page or the entire page and the ability to create dependencies between cached objects and the files or database tables/rows that they were populated from via the CacheDependency and SqlCacheDependency classes. Although some of these features are also available in various Open Source web development frameworks such as Ruby on Rails + memcached, none give as much functionality out of the box as ASP.NET or so it seems.

Below is what the code for the GetFriends and AddFriend methods would look like using the built-in ASP.NET caching features

public ArrayList GetFriends(int user_id){

    ArrayList friends = (ArrayList) Cache.Get("friendslist:" + userid);

    if(friends == null){
        // Open the connection
        dbConnection.Open();

        SqlCommand cmd = new SqlCommand("select friend_id from friends_list where owner_id=" + "user_id", dbConnection);

        SqlCacheDependency dependency = new SqlCacheDependency(cmd);
        SqlDataReader reader = cmd.ExecuteReader();

        // Add each friend ID to the list
        while (reader.Read()){
            friends.Add(rdr[0]);
        }

        reader.Close();
        dbConnection.Close();

        //insert friends list into cache with associated dependency
        Cache.Insert("friendslist:" + userid, friends, dependency);
    }
    return friends;
 }

public void AddFriend(int user_id, int new_friends_id){
    // Open the connection
    dbConnection.Open();

    SqlCommand cmd = new SqlCommand("insert into friends_list (owner_id, friend_id) values (" + user_id + "," + new_friend_id ")";
    cmd.ExecuteNonQuery();

    /* no need to remove from cache because SqlCacheDependency takes care of that automatically */
    // Cache.Remove("friendslist:" + userid);

    dbConnection .Close();
}

Using the SqlCacheDependency class gets around a significant limitation of the ASP.NET Cache class. Specifically, the cache is not distributed. This means that if you have multiple Web front ends, you'd have to write your own code to handle partitioning data and invalidating caches across your various Web server instances. In fact, there are numerous articles showing how to implement such a solution including Synchronizing the ASP.NET Cache across AppDomains and Web Farms by Peter Bromberg and Use Data Caching Techniques to Boost Performance and Ensure Synchronization by David Burgett.

However, let's consider how how SqlCacheDependency is implemented. If you are using SQL Server 7 or SQL Server 2000, then your ASP.NET process polls the database at regular intervals to determine whether the target(s) of the original query have changed. For SQL Server 2005, the database can be configured to send change notifications to the Web servers if the target(s) of the original query change. Either way, the database is doing work to determine if the data has changed. Compared to the memcached this still doesn't seem as efficient as we can get if we want to eke out every last out of performance out of the system although it does lead to simpler code.

If you are a developer on the WISC platform and are concerned about getting the best performance out of your Web site, you should take a look at memcached for Win32. The most highly trafficked site on the WISC platform is probably MySpace and in articles about how they are platform works such as Inside MySpace.com they extol the virtues of moving work out of the database and relying on cache servers.


 

Categories: Platforms | Programming | Web Development

In my efforts to learn more about Web development and what it is like for startups adopting Web platforms I've decided to build an application on the Facebook platform. I haven't yet decided on an application but for the sake of argument let's say it is a Favorite Comic Books application which allows me to store my favorite comic books and shows me to most popular comic books among my friends.

The platform requirements for the application seems pretty straightforward. I'll need a database and some RESTful Web services which provide access to the database from the widget which can be written in my language of choice. I'll also need to write the widget in FBML which will likely mean I'll have to host images and CSS files as well. So far nothing seems particularly esoteric. 

Since I didn't want my little experiment eventually costing me a lot of money, I thought this was an excellent time to try out Amazon's Simple Storage Service (S3) and Elastic Compute Cloud (EC2) services since I'll only pay for as much resources as I use instead of paying a flat hosting fee..

However it seems supporting this fairly straightforward application is beyond the current capabilities of EC2 + S3. S3 is primarily geared towards file storage so although it makes a good choice for cheaply hosting images and CSS stylesheets, it's a not a good choice for storing relational or structured data. If it was just searching within a single user's data ( e.g. just searching within my favorite comics) I could store it all in single XML file then use XPath to find what I was looking for. However my application will need to perform aggregated queries across multiple user's data (i.e. looking at the favorite comics of all of my friends then fetching the most popular ones) so a file based solution isn't a good fit. I really want a relational database.

EC2 seemed really promising because I could create a virtual server running in Amazon's cloud and load it up with my choice of operating system, database and Web development tools. Unfortunately, there was a fly in the ointment. There is no persistent storage in EC2 so if your virtual server goes down for any reason such as taking it down to install security patches or a system crash, all your data is lost.

This is a well known problem within the EC2 community which has resulted in a bunch of clever hacks being proposed by a number of parties. In his post entitled Amazon EC2, MySQL, Amazon S3 Jeff Barr of Amazon writes

I was on a conference call yesterday and the topic of ways to store persistent data when using Amazon EC2 came up a couple of times. It would be really cool to have a persistent instance of a relational database like MySQL but there's nothing like that around at the moment. An instance can have a copy of MySQL installed and can store as much data as it would like (subject to the 160GB size limit for the virtual disk drive) but there's no way to ensure that the data is backed up in case the instance terminates without warning.

Or is there?

It is fairly easy to configure multiple instances of MySQL in a number of master-slave, master-master, and other topologies. The master instances produce a transaction log each time a change is made to a database record. The slaves or co-masters keep an open connection to the master, reading the changes as they are logged and mimicing the change on the local copy. There can be some replication delay for various reasons, but the slaves have all of the information needed to maintain exact copies of the database tables on the master.

Besides the added complexity this places on the application, it still isn't fool proof as is pointed out in the various comments in response to Jeff's post.

Demitrious Kelly who also realizes the problems with relying on replication to solve the persistence problem proposed an alternate solution in his post MySQL on Amazon EC2 (my thoughts) where he writes

Step #2: I’m half the instance I used to be! With each AMI you get 160GB of (mutable) disk space, and almost 2GB of ram, and the equivalent of a Xeon 1.75Ghz processor. Now divide that, roughly, in half. You’ve done that little math exercise because your one AMI is going to act as 2 AMI's. Thats right. I’m recommending running two separate instances of MySQL on the single server.

Before you start shouting at the heretic, hear me out!

+-----------+   +-----------+
| Server A | | Server B |
+-----------+ +-----------+
| My | My | | My | My |
| sQ | sQ | | sQ | sQ |
| l | l | | l | l |
| | | | | |
| #2<=== #1 <===> #1 ===>#2 |
| | | | | |
+ - - - - - + + - - - - - +

On each of our servers, MySQL #1 and #2 both occupy a max of 70Gb of space. The MySQL #1 instances of all the servers are setup in a master-master topography. And the #2 instance is setup as a slave only of the #1 instance on the same server. so on server A MySQL #2 is a copy (one way) of #1 on server A.

With the above setup *if* server B were to get restarted for some reason you could: A) shut down the MySQL instance #2 on server A. Copy that MySQL #2 over to Both slots on server B. Bring up #1 on server B (there should be no need to reconfigure its replication relationship because #2 pointed at #1 on server A already). Bring up #2 on server B, and reconfigure replication to pull from #1 on ServerB. This whole time #1 on Server A never went down. Your services were never disrupted.

Also with the setup above it is possible (and advised) to regularly shut down #2 and copy it into S3. This gives you one more layer of fault tollerance (and, I might add, the ability to backup without going down.)

Both solutions are fairly complicated, error prone and still don't give you as much reliability as you would get if you simply had a hard disk that didn't lose all its data when you rebooted the server goes down. At this point it is clear that a traditional hosted service solution is the route to go. Any good suggestions for server-side LAMP or WISC hosting that won't cost an arm and a leg? Is Joyent any good?

PS: It is clear this is a significant problem for Amazon's grid computing play and one that has to be fixed if the company is serious about getting into the grid computing game and providing a viable alternative to startups looking for a platform to build the next "Web 2.0" hit. Building a large scale, distributed, relational database then making it available to developers as a platform is unprecedented so they have their work cut out for them. I'd incorrectly assumed that BigTable was the precedent for this but I've since learned that BigTable is more like a large scale, distributed, spreadsheet table as opposed to a relational database. This explains a lot of the characteristics of the query API of Google Base.


 

Categories: Web Development

None of these was worth an entire post.

  1. Universal Music Group Refuses to Renew Apple's Annual License to Sell Their Music on iTunes: So this is what it looks like when an industry that has existed for decades begins to die. I wonder who's going to lose out more? Apple because people some people stop buying iPods because they can't buy music from Jay-Z and Eminem on iTunes or Universal Music Group for closing itself out of the biggest digital music marketplace in the world in the midst of declining CD sales worldwide. It's as if the record labels are determined to make themselves irrelevant by any means necessary. 

  2. Standard URLs - Proposal for a Web with Less Search: Wouldn't it be cool if every website in the world used the exact same URL structure based on some ghetto reimplementation of the Dewey Decimal System? That way I could always type http://www.amazon.com/books/j-k-rowling/harry-potter-and-the-goblet-of-fire or http://www.bn.com/books/j-k-rowling/harry-potter-and-the-goblet-of-fire  to find the Harry Potter book on whatever book website I was on instead of typing "harry potter goblet of fire" into a search box. Seriously.

    This is the kind of idea that makes sense when you are kicking it with your homeboys late at night drinking 40s and smokin' blunts but ends up making you scratch your head in the morning when you sober up, wondering how you could have ever come up with such a ludicrous idea.

  3. Facebook has 'thrown the entire startup world for a loop': This post is by a startup developer complaining that Facebook has placed limits on usage of their APIs which prevent Facebook widgets from spamming a user's friends when the user adds the widget to their profile. What does he expect? That Facebook should make it easier for applications to spam their users? WTF? Go read Mike Torres's post Facebook weirdness then come back and explain to me why the folks at Facebook should be making it easier for applications to send spam on a user's behalf in the name of encouraging the "viral growth of apps".

  4. Does negative press make you Sicko? Google ad sales rep makes impassioned pitch to big Pharmaceutical companies and HMOs to counter the negative attention from Michael Moore's Sicko by buying Google search ads and getting Google to create "Get the Facts" campaigns for them. I guess all that stuff Adam Bosworth said about Google wanting to help create better educated patients doesn't count since patients don't buy ads. ;) Talk about making your employer look like an unscrupulous, money grubbing whore. Especially 

    Do no evil. It's now Search, Ads and Apps

  5. People Who Got in Line for an iPhone: I was at the AT&T store on the day of the iPhone launch to pick up a USB cable for my fianc´e. It took me less than ten minutes to deal with the line at around 8:00PM and they still had lots of iPhones. It seems people had waited hours in line that day and I could have picked one up with just ten minutes of waiting on launch day if I wanted one. I bet if you came on Saturday the lines were even shorter and by today you could walk in. Of course, this is assuming you are crazy enough to buy a v1 iPhone in the first place.


 

Working on social software, I've been thinking about the dynamics around walled gardens recently. There is a strong tendency for Web applications that benefit from network effects to metamorphosize into walled gardens. Unfortunately, I haven't found a definition of "walled garden" that I like. Most references to "walled gardens" in the context of internet services refers to ISPs creating their own closed network which is "pay to play" for publishers as opposed to giving their users access to the World Wide Web.  The most popular examples of this are AOL of old and most recently mobile phone carriers. However this definition doesn't really capture the new way in which users are being tied to a particular vendors vision of the Web, thanks to the power of network effects.

In his post Facebook vs. AOL, redux Jason Kottke writes

I wanted to clarify my comments about Facebook's similarities to AOL. I don't think Facebook is a bad company or that they won't be successful; they seem like smart passionate people who genuinely care about making a great space for their users.1 It's just that I, unlike many other people, don't think that Facebook and Facebook Platform are the future of the web. The platform is great for Facebook, but it's a step sideways or even backwards (towards an AOL-style service) for the web.

Think of it this way. Facebook is an intranet for you and your friends that just happens to be accessible without a VPN. If you're not a Facebook user, you can't do anything with the site...nearly everything published by their users is private. Google doesn't index any user-created information on Facebook.2
...Compare this with MySpace or Flickr or YouTube. Much of the information generated on these sites is publicly available. The pages are indexed by search engines. You don't have to be a user to participate
...
Everything you can do on Facebook with ease is possible using a loose coalition of blogging software, IM clients, email, Twitter, Flickr, Google Reader, etc.

In his post Avoiding Walled Gardens on the Internet Jeff Atwood writes

I occasionally get requests to join private social networking sites, like LinkedIn or Facebook. I always politely decline. I understand the appeal of private social networking, and I mean no disrespect to the people who send invites. But it's just not for me.

I feel very strongly that we already have the world's best public social networking tool right in front of us: it's called the internet. Public services on the web, such as blogs, twitter, flickr, and so forth, are what we should invest our time in. And because it's public, we can leverage the immense power of internet search to tie it all-- and each other-- together.

In comparison, adding content to a private, walled garden on the internet smacks of the old-world America Online ideology:

Jason and Jeff are both smart guys but they think like geeks. To me it seems pretty obvious why the average person would want to to use one application for managing photos, blogging, IM, reading feeds with updates from their friends, etc instead if using half a dozen products. Especially if the one product fosters a sense of community better than any of the other individual products does on its own. Of course, I've said this all before in my post Why Facebook is Bigger than Blogging so I won't repeat myself here.

What I do find interesting is trying to define what makes Facebook a "walled garden". Jeff and Jason seems to think it primarily hinges on the content produced on the site being visible to search engines. This belief seems fairly widespread since I've also seen mentions of it on blog posts by Steve Rubel and danah boyd. This definition doesn't sit right with me. I actually think it's a good thing that the drunk frat party pics, emotional public break ups and experimentation with new ideas that make up the average high schooler or college students life (i.e. the primary demographic of social networks) shouldn't out there for search engines to index, cache and then keep around forever. From the perspective of Jeff and Jason it seems the problem with walled gardens is that people outside the garden can't see it's beauty, from my perspective the problem is that they surround their users with beauty to disguise the fact that they are trapped (or should I say locked in?). Thus I actually think of "walled gardens" as services that limit users. This is the difference between being able to send email to anyone on the internet and only being able to send mail to send email to people who use the same ISP. It's the difference between being able to send instant messages to anyone who uses an IM client instead of just to people who use the same IM service or use a product that has "paid to play" in your IM network. It's the difference between being able to accept any payment on your online auction (e.g. Google Checkout) and being told you can only use that provided by the the owner of the marketplace (e.g. Paypal). 

When you look at things from that perspective, there are more walled gardens on the Web than people would care to admit. Why this is the case is eloquently explained in this comment by leoc on reddit which is excerpted below

Certainly it's not simple, but that's why it would be the "next Web" rather than what we have now. It seems there are two big forces creating the Web 2.0 social-site bottleneck.

One is the fact that hosting is fiddly and expensive. It's certainly better than it was, but it's still the preserve of nerds and professionals. (El-cheapo PHP4 hosting is almost cheap enough but too fiddly and much too limited; virtual hosting and up is flexible but much too fiddly and expensive.) We need a future where the casual user can drag-n-drop fairly arbitrary programs and documents into his own webspace without having to worry too much about configurations or Digg-storms or bandwidth fees.

The other is that centrally-controlled databases are the low-hanging fruit; it's much harder to create decentralised systems that work. It's much easier for Amazon to have users log in and enter their book reviews in Amazon's database than it would be for you to find and aggregate all the reviews for a given book from the Web, then reputation/spam-filter them and present them in some kind of coherent framework. (And that's assuming that people would rather put reviews in their own webspace rather than just typing into an Amazon textarea - problem one again.) Similarly, the classic wiki model is inherently centralised

In today's world, it is far easier and cheaper for the average Web user to use a centralized hosted service than utilize a service on their own Web space even if they can get the software for free. In addition, a lot of software on the Web especially social software applications benefit from network effects so once a service hits a certain critical mass, displacing it is a losing proposition whether it is an online auction market or the #1 instant messaging application. When you put these things together you get a world where the dominant software in certain categories tends towards a monopoly or at the very least conforms to a Power law distribution

And once a product gets to that point, it is easy to think in terms of preserving marketshare as opposed to giving users choice. At that point, another walled garden has been created.


 

Categories: Social Software

Recently an email written by a newly hired Microsoft employee about life as a Google employee made the rounds on popular geek social news and social bookmarking sites such as Slashdot, Reddit, del.icio.us and Digg. The mail was forwarded around and posted to a blog without the permission of its original author. The author of the email (who should not be confused with the idiot who blogs at http://no2google.wordpress.com) has posted a response which puts his email in context in addition to his reaction on seeing his words splattered across the Internet. In his post entitled My Words Geoffrey writes

Today my words got splashed all around the Internet. It’s interesting to see them living a life of their own outside the context they were created in. I enjoyed seeing it on Slashdot, reading the thoughtful responses whether they agreed or disagreed, and laughing out loud at the people who were just there to make noise. It’s fun, in the abstract, to the be the author of the secret thing everyone is gathered around the water cooler talking about.

The responses are my personal impressions, communicated to my Microsoft recruiter in the context of a private 1:1 conversation. A few days after I sent my response to the recruiter, I saw an anonymized version floating around and being discussed inside Microsoft. I hadn’t realized at the time that I wrote it that it would be distributed widely within Microsoft so that was a bit of a shock. To see them distributed all over the Internet was another shock altogether. The biggest shock was when Mary Jo Foley over at ZDNet Blogs sent a message to my personal email account.

Read the rest of his post to see the email he sent to Mary Jo Foley as well as how he feels about having words he thought were being shared in private published to tens of thousands of people without his permission and with no thought to how it would impact him.


 

Categories: Life in the B0rg Cube

I was recently in a conversation where we were talking about things we'd learned in college that helped us in our jobs today. I tried to distill it down to one thing but couldn't so here are the three I came up with.

  1. Operating systems aren't elegant. They are a glorious heap of performance hacks piled one upon the other.
  2. Software engineering is the art of amassing collected anecdotes and calling them Best Practices when in truth they have more in common with fads than anything else.
  3. Pizza is better than Chinese food for late night coding sessions.

What are yours?


 

Categories: Technology

Disclaimer: This is my opinion. It does not reflect the intentions, strategies, plans or stated goals of my employer

Ever since the last Microsoft reorg where it's Web products were spread out across 3 Vice Presidents I've puzzled about why the company would want to fragment its product direction in such a competitive space instead of having a single person responsible for its online strategy.

Today, I was reading an interview with Chris Jones, the corporate vice president of Windows Live Experience Program Management entitled Windows Live Moves Into Next Phase with Renewed Focus on Software + Services and a lightbulb went off in my head. The relevant bits are excerpted below

PressPass: What else is Microsoft announcing today?

Jones: Today we’re also releasing a couple of exciting new services from Windows Live into managed beta testing: Windows Live Photo Gallery beta and Windows Live Folders beta.

Windows Live Photo Gallery is an upgrade to Windows Vista’s Windows Photo Gallery, offered at no charge, and enables both Windows Vista and Windows XP SP2 customers to share, edit, organize and print photos and digital home videos... We’re also releasing Windows Live Folders into managed beta today, which will provide customers with 500 megabytes of online storage at no charge.
...
We’re excited about these services and we see today’s releases as yet another important step on the path toward the next generation of Windows Live, building on top of the momentum of other interesting beta releases we’ve shared recently such as Windows Live Mail beta, Windows Live Messenger beta and Windows Live Writer beta....soon we’ll begin to offer a single installer which will give customers the option of an all-in-one download for the full Windows Live suite of services instead of the separate installation experience you see today. It’s going to be an exciting area to watch, and there’s a lot more to come.

PressPass: You talk a lot about a “software plus services” strategy. What does that mean and how does it apply to what you’re talking about today?

Jones: It’s become a buzz word of sorts in the industry, but it’s a strategy we truly believe in. The fact that we’re committed to delivering software plus services means we’re focused on building rich experiences on top of your Windows PC; services like those offered through Windows Live.

All the items in red font refer to Windows desktop applications in one way or the other. At this point it now made sense to me why there were three VPs running different bits of Microsoft's online products and why one of them was also the VP that owned Windows. The last reorg seems to have divided Microsoft's major tasks in the online space across the various VPs in the following manner

  • Satya Nadella: Running the search + search ads business (i.e. primarily competing with Google search and AdWords)

  • Steve Berkowitz: Running the content + display ads business (i.e. primarily competing with Yahoo!'s content and display ad offerings)

  • Steven Sinofsky and Chris Jones: Adding value to the Windows platform using online services (i.e. building something similar to iLife + .Mac for Windows users). 

From that perspective, the reorgs make a lot more sense now. The goals and businesses are different enough that having people singularly focused on each of those tasks makes more sense than having one person worry about such disparate [and perhaps conflicting] goals. The interesting question to me is what does it mean for Microsoft's Web-based Windows Live properties like Windows Live Hotmail, Windows Live Favorites and Windows Live Spaces if Microsoft is going to be emphasizing the Windows in Windows Live? I guess we've already seen announcements some announcements from the mail side like Windows Live Mail and the Microsoft Office Outlook Connector now being free.

Another interesting question is where  Ray Ozzie fits in all this.


 

Categories: Life in the B0rg Cube | MSN | Windows Live

These are my notes from the talk Scaling Google for Every User by Marissa Mayer.

Google search has lots of different users who vary in age, sex, location, education, expertise and a lot of other factors. After lots of research, it seems the only factor that really influences how different users view search relevance is their location.

One thing that does distinguish users is the difference between a novice search user and an expert user of search. Novice users typically type queries in natural language while expert users use keyword searches.

Example Novice and Expert Search User Queries

NOVICE QUERY: Why doesn't anyone carry an umbrella in Seattle?
EXPERT QUERY: weather seattle washington

NOVICE QUERY: can I hike in the seattle area?
EXPERT QUERY: hike seattle area

On average, it takes a new Google user 1 month to go from typing novice queries to being a search expert. This means that there is little payoff in optimizing the site to help novices since they become search experts in such a short time frame.

Design Philosophy

In general, when it comes to the PC user experience, the more features available the better the user experience. However when it comes to handheld devices the graph is a bell curve and there reaches a point where adding extra features makes the user experience worse. At Google, they believe their experience is more like the latter and tend to hide features on the main page and only show them when necessary (e.g. after the user has performed a search). This is in contrast to the portal strategy from the 1990s when sites would list their entire product line on the front page.

When tasked with taking over the user interface for Google search, Marissa Mayer fell back on her AI background and focused on applying mathematical reasoning to the problem. Like Amazon, they decided to use split A/B testing to test different changes they planned to make to the user interface to see which got the best reaction from their users. One example of the kind of experiments they've run is when the founders asked whether they should switch from displaying 10 search results by default because Yahoo! was displaying 20 results. They'd only picked 10 results arbitrarily because that's what Alta Vista did. They had some focus groups and the majority of users said they'd like to see more than 10 results per page. So they ran an experiment with 20, 25 and 30 results and were surprised at the outcome. After 6 weeks, 25% of the people who were getting 30 results used Google search less while 20% of the people getting 20 results used the site less. The initial suspicion was that people weren't having to click the "next" button as much because they were getting more results but further investigation showed that people rarely click that link anyway. Then the Google researchers realized that while it took 0.4 seconds on average to render 10 results it took 0.9 seconds on average to render 25 results. This seemingly imperciptible lag was still enough to sour the experience of users enough that they'd reduce their usage of the service.

Improving Google Search

There are a number of factors that determine whether a user will find a set of search results to be relevant which include the query, the actual user's individual tastes, the task at hand and the user's locale. Locale is especially important because a query such as "GM" is likely be a search for General Motors but a query such as "GM foods" is most likely seeking information about genetically modified foods. Given a large enough corpus of data, statistical inference can seem almost like artificial intelligence. Another example is that a search like b&b ab looks for bed and breakfasts in Alberta while ramstein ab locates the Ramstein Airforce Base. This is because in general b&b typically means bed and breakfast so a search like "b&b ab" it is assumed that the term after "b&b" is a place name based on statistical inference over millions of such queries.

At Google they want to get even better at knowing what you mean instead of just looking at what you say. Here are some examples of user queries which Google will transform to other queries based on statistical inference [in future versions of the search engine]

User Query Google Will Also Try This Query
unchanged lyrics van halenlyrics to unchained by van halen
how much does it cost for an exhaust systemcost exhaust system
overhead view of bellagio poolbellagio pool pictures
distance from zurich switzerland to lake como italytrain milan italy zurich switzerland

Performing query inference in this manner is a very large scale, ill-defined problem. Other efforts Google is pursuing is cross language information retrieval. Specifically, if I perform a query in one language it will be translated to a foreign language and the results would then be translated to my language. This may not be particularly interesting for English speakers since most of the Web is in English but it will be valuable for other languages (e.g. an Arabic speaker interested in restaurant reviews from New York City restaurants).

Google Universal Search was a revamp of the core engine to show results other than text-based URLs and website summaries in the search results (e.g. search for nosferatu). There were a number of challenges in building this functionality such as

  • Google's search verticals such as books, blog, news, video, and image search got a lot less traffic than the main search engine and originally couldn't handle receiving the same level of traffic as the main page.
  • How do you rank results across different media to figure out the most relevant? How do you decide a video result is more relevant than an image or a webpage? This problem was tackled by Udi Manber's team.
  • How do you integrate results from other media into the existing search result page? Should results be segregated by type or should it be a list ordered by relevance independent of media type? The current design was finally decided upon by Marissa Mayer's team but they will continue to incrementally improve it and measure the user reactions.

At Google, the belief is that the next big revolution is a search engine that understands what you want because it knows you. This means personalization is the next big frontier. A couple of years ago, the tech media was full of reports that a bunch of Stanford students had figured out how to make Google five times faster. This was actually incorrect. The students had figured out how to make PageRank calculations faster which doesn't really affect the speed of obtaining search results since PageRank is calculated offline. However this was still interesting to Google and the students' company was purchased. It turns out that making PageRank faster means that they can now calculate multiple PageRanks in the time it used to take to calculate a single PageRank (e.g. country specific PageRank, personal PageRank for a given user, etc). The aforementioned Stanford students now work on Google's personalized search efforts.

Speaking of personalization, iGoogle has become their fastest growing product of all time. Allowing users create a personalized page then opening up the platform to developers such Caleb to build gadgets lets them learn more about their users. Caleb's collection of gadgets garner about 30 million daily page views on various personalized homepage.

Q&A

Q: Does the focus on expert searchers mean that they de-emphasis natural language processing?
A: Yes, in the main search engine. However they do focus on it for their voice search product and they do believe that it is unfortunate that users have to adapt to Google's keyword based search style.

Q: How do the observations that are data mined about users search habits get back into the core engine?
A: Most of it happens offline not automatically. Personalized search is an exception and this data is uploaded periodically into the main engine to improve the results specific to that user.

Q: How well is the new Universal Search interface doing?
A: As well as Google Search is since it is now the Google search interface.

Q: What is the primary metric they look at during A/B testing?
A: It depends on what aspect of the service is being tested.

Q: Has there been user resistance to new features?
A: Not really. Google employees are actually more resistant to changes in the search interface than their average user.

Q: Why did they switch to showing Google Finance before Yahoo! Finance when showing search results for a stock ticker?
A: Links used to be ordered by ComScore metrics but ince Google Finance shipped they decided to show their service first. This is now a standard policy for Google search results that contain links to other services.

Q: How do they tell if they have bad results?
A: They have a bunch of watchdog services that track uptime for various servers to make sure a bad one isn't causing problems. In addition, they have 10,000 human evaluators who are always manually checking teh relevance of various results.

Q: How do they deal with spam?
A: Lots of definitions for spam; bad queries, bad results and email spam. For keeping out bad results they do automated link analysis (e.g. examine excessive number of links to a URL from a single domain or set of domains) and they use multiple user agents to detect cloaking.

Q: What percent of the Web is crawled?
A: They try to crawl most of it except that which is behind signins and product databases. And for product databases they now have Google Base and encourage people to upload their data there so it is accessible to Google.

Q: When will I be able to search using input other than search (e.g. find this tune or find the face in this photograph)?
A: We are still a long way from this. In academia, we now have experiments that show 50%-60% accuracy but that's a far cry from being a viable end user product. Customers don't want a search engine that gives relevant results half the time.


 

Categories: Trip Report

These are my notes from the talk Lessons in Building Scalable Systems by Reza Behforooz.

The Google Talk team have produced multiple versions of their application. There is

  • a desktop IM client which speaks the Jabber/XMPP protocol.
  • a Web-based IM client that is integrated into GMail
  • a Web-based IM client that is integrated into Orkut
  • An IM widget which can be embedded in iGoogle or in any website that supports embedding Flash.

Google Talk Server Challenges

The team has had to deal with a significant set of challenges since the service launched including

  • Support displaying online presence and sending messages for millions of users. Peak traffic is in hundreds of thousands of queries per second with a daily average of billions of messages handled by the system.

  • routing and application logic has to be applied to each message according to the preferences of each user while keeping latency under 100ms.

  • handling surge of traffic from integration with Orkut and GMail.

  • ensuring in-order delivery of messages

  • needing an extensibile architecture which could support a variety of clients

Lessons

The most important lesson the Google Talk team learned is that you have to measure the right things. Questions like "how many active users do you have" and "how many IM messages does the system carry a day" may be good for evaluating marketshare but are not good questions from an engineering perspective if one is trying to get insight into how the system is performing.

Specifically, the biggest strain on the system actually turns out to be displaying presence information. The formula for determining how many presence notifications they send out is

total_number_of_connected_users * avg_buddy_list_size * avg_number_of_state_changes

Sometimes there are drastic jumps in these numbers. For example, integrating with Orkut increased the average buddy list size since people usually have more friends in a social networking service than they have IM buddies.

Other lessons learned were

  1. Slowly Ramp Up High Traffic Partners: To see what real world usage patterns would look like when Google Talk was integrated with Orkut and GMail, both services added code to fetch online presence from the Google Talk servers to their pages that displayed a user's contacts without adding any UI integration. This way the feature could be tested under real load without users being aware that there were any problems if there were capacity problems. In addition, the feature was rolled out to small groups of users at first (around 1%).

  2. Dynamic Repartitioning: In general, it is a good idea to divide user data across various servers (aka partitioning or sharding) to reduce bottlenecks and spread out the load. However, the infrastructure should support redistributing these partitions/shards without having to cause any downtime.

  3. Add Abstractions that Hide System Complexity: Partner services such as Orkut and GMail don't know which data centers contain the Google Talk servers, how many servers are in the Google Talk cluster and are oblivious of when or how load balancing, repartitioning or failover occurs in the Google Talk service.

  4. Understand Semantics of Low Level Libraries: Sometimes low level details can stick it to you. The Google Talk developers found out that using epoll worked better than the poll/select loop because they have lots of open TCP conections but only a relatively small number of them are active at any time.

  5. Protect Against Operational Problems: Review logs and endeavor to smooth out spikes in activity graphs. Limit cascading problems by having logic to back off from using busy or sick servers.

  6. Any Scalable System is a Distributed System: Apply the lessons from the fallacies of distributed computing. Add fault tolerance to all your components. Add profiling to live services and follow transactions as they flow through the system (preferably in a non-intrusive manner). Collect metrics from services for monitoring both for real time diagnosis and offline generation of reports.

Recommended Software Development Strategies

Compatibility is very important, so making sure deployed binaries are backwards and forward compatible is always a good idea. Giving developers access to live servers (ideally public beta servers not main production servers) will encourage them to test and try out ideas quickly. It also gives them a sense of empowerement. Developers end up making their systems easier to deploy, configure, monitor, debug and maintain when they have a better idea of the end to end process.

Building an experimentation platform which allows you to empirically test the results of various changes to the service is also recommended.


 

Categories: Platforms | Trip Report