Yesterday I saw a blog post by John Battelle where he said he turned down a chance to be part of discussion on a television show about whether Facebook is the next Google. This seems like a pretty absurd assertion on the face of it and the reason is snuggled somewhere in the middle of Robert Scoble's post Why Facebook, why now? where he writes

Everything I do in Facebook is about interacting with people. For instance, at the top of my Facebook inbox right now is Ryan Coomer. The advertising next to him says “Try Forex Trading Today.” There is absolutely NO connection between who Ryan is and the advertising that’s put next to him.

Imagine if advertisers could “buy people.” I just clicked on Ryan’s profile, hes into Running and Golf. Why don’t ads for running and golf gear get put onto his profile? Wouldn’t that make sense? He’s also a software developer. Where’s the Visual Studio advertisement? He’s into video games. Where’s the Halo 3 advertisement?

Translation: Facebook needs an advertising platform and it needs one in the worst way. I’m not going to even look at the ads until the ads are tied to the people on Facebook. Facebook knows what we’re into, put ads for those things onto our profiles and messages.

Robert Scoble is both right and wrong. He is right that Facebook needs an advertising platform but he is wrong about how it should be used. If Facebook has detailed demographic data about all its users then it makes the most sense to show the user what they are most interested in and not what the person they are currently interacting with on the site is most interested in. That's counter productive. I hate running and golf, I like G-Unit and alcopops. Which ads does it make more sense to show me if I'm browsing Ryan Coomer's profile?

Anyway, that's besides the point. Let's go back and look at what made Google such a compelling investment when they had their IPO. The company had

  1. An extremely popular website that served a key need for its users. 
  2. A way to make a metric ton of money from their popularity via search ads.

#1 is obvious but #2 needs further clarification. Search advertising created a paradigm shift in the advertising market in several ways. For one, it was the first time in the history of advertising that advertisers had a way to conclusively calculate the ROI of their advertising campaigns compared to other media (TV, newspapers, radio, sponsorships, etc). More importantly, it allowed advertisers to target people when they were in the process of making a buying related decision (i.e. doing research). For the most part advertising is often intrusive, irrelevant and often just plain annoying. On the other hand, ads shown when you are doing research for a commercial transaction are exactly what you want. It is quite telling when you look at highest paying search terms and see that it is for terms like "mortgage refinance", "dui" and "consolidate loans". When searching for those terms I want to see ads, in fact the more ads the better.  What is most interesting is that even though the majority of searches on the Web are non-commercial, this was still a goldmine which could only get more lucrative as more people got on the Web and advertisers eventually realized that their dollars needed to follow the eyeballs and search advertising was actually an easier expense to justify than all their other ad campaigns.

Now we know what made Google such an attractive investment back in the day which has now turned itself into a major force in the software industry and on the Web. So what does Facebook have?

  1. An extremely popular website that served a key need for its users. 
  2. The hope that way will be able to make a metric ton of money from their popularity via highly targeted ads based on all the rich demographic data they have about their users.

The initial thrust of #2 is that Facebook hasn't figured out how to make money yet. More importantly, if they do figure out how to build an ad platform it will likely be a system that is more complex than Google's Adwords and AdSense combined. When you think about it Adwords is pretty simple, allow people to bid on search terms then show these people's ads when people use those search terms. AdSense is similarly straightforward, use term extraction to convert the content on the page to the equivalent of search terms and show ads from the people who bid on ads. Of course, the devil is in the details and is mainly about optimizing how much money can be extracted from their advertisers.

On the other hand, if you want to show display ads based on demographic data you'll also need to build something very similar to a recommendation system as well as everything else I've described because you want to know that if I like wearing G-Unit clothing that I probably wouldn't mind seeing ads for Ecko Unlimited wear either or that if I like music by Young Jeezy that I probably would like music from T.I. or that if I've read Freakonomics I probably wouldn't mind reading The Tipping Point.

Now let's assume that the developers at Facebook figure out how to build an advertising platform that is more complex than Google's Adwords and AdSense combined. There is still the question of whether the "spray and pray" approach that is the hallmark of display advertising [even if you are sure that the ads will be about the topics the user is interested in]  is a more lucrative model than showing people ads when they are actually looking for something via search ads. I personally don't think it will be but I'm no expert and I'm definitely no fortune teller.

I suspect that the folks at Facebook will eventually realize how hard a problem it is to monetize their users in a way that justifies the exorbitant valuation that has now been placed on them. When that time comes, I wouldn't be surprised if they find some sucker to buy them based on the promise that they present as opposed to the value that they actually bring. Almost like when Yahoo! paid $5 billion for which gave us the Mark Cuban we all know and love but also ended up as one of the top five worst billion dollar acquisitions of all time in the Internet space.


I should probably start out by pointing out that the title of this post is a lie. By definition, RESTful protocols can not be truly SQL-like because they depend on Uniform Resource Identifiers (URIs aka URLs) for identifying resources. URIs on the Web are really just URLs and URLs are really just hierarchical paths to a particular resource similar to the paths on your local file system (e.g. /users/mark/bobapples, A:\Temp\car.jpeg). Fundamentally URIs identify a single resource or aset of resources. On the other hand, SQL is primarily about dealing with relational data which meansyou write queries that span multiple tables (i.e. resources). A syntax for addressing single resources (i.e. URLs/URIs) is fundamentally incompatible with a query language that operates over multiple resources. This was one ofthe primary reasons the W3C created XQuery even though we already had XPath.

That said, being able to perform sorting, filtering, and aggregate operations over a single set of resources via a URI is extremely useful and is a fundamental aspect of the Web today. As Sam Ruby points out in his blog post Etymology, a search results page is fundamentally RESTful even though its URI identifies a query as opposed to a specific resource or set of resources [although you could get meta and say it identifies the set of resources that meet your search criteria].

Both Google's Google Base data API and Microsoft's Project Astoria are RESTful protocols for performing sorting, filtering and aggregate operations similar to what you find in SQL over a hierarchical set of resources. What follows is an overview of the approaches taken by both protocols.

Filtering Results using Predicates (include supported operators and Google's full text option)

Although Astoria provides an abstraction over relational data, it does so in a way that supports the hierarchical nature of HTTP URIs. The primary resource also known as an entity set is placed at the root of the hierarchy (e.g. the set of allmy customers) and each relationship to another set of resources is treated as anotherlevel in the hierarchy (e.g. each customer's orders). Each step in the hierarchy can be filtered using a predicate. Below are some query URLs and the results they return

Results:All content areas in the Encarta online encyclopedia
Query:[name eq 'Geography']
Results:The content area whose name is 'Geography'
Query:[name eq 'Geography']/Articles
Results:All articles for the content area whose name is 'Geography'
Query:[name eq 'Geography']/Articles[Title eq 'Zimbabwe']
Results:The article with the title 'Zimbabwe' from the 'Geography' content area
Query:[name eq 'Geography']/Articles[761569370]
Results:The article from the 'Geography' content area with the ID 761569370.
Query:[City eq 'London']/Orders[Freight gteq 100]
Results:All orders shipped to customers from the city of London who paid $100 or more in freight charges

Google Base does not treat the data within it as a hierarchy. Instead filters/predicates can be applied to one of two Atom feeds; and which represent all the items within Google Base and all the items a specific user has stored within Google Base respectively. The latter URL requires the HTTP request to be authenticated.

The first way one can filter results from a Google Base feed is by placing one or more categories as part of the path component. For example

Results:All items from the 'hotels' category within Google Base
Results:All items from the 'jobs' or the 'personals' category within Google Base
Results:All items in Google Base except those from the 'recipes' category

The second way is to filter results in a Google Base feed is by performing a full text query using the q query parameter. For example,

Results:All items within Google Base that contain the string 'atlanta' in one of their fields
Results:All items from the 'hotels' category within Google Base that contain the string 'atlanta' in any of their fields
Results:All items from the 'hotels' or 'housing' categories within Google Base that contain the string 'seattle' or 'atlanta' in any of their fields
Results:All items from the 'hotels' category within Google Base that contain the string 'washington' but not the string 'dc'

The final way to filter results from Google Base feed is by applying a predicate on a field of the item using the bq query parameter. For example

Results:All items from the 'hotels' category that have 'seattle' in their location field
Results:All items from the 'hotels' category that have 'seattle' in their location field and 'ramada' in any of their other fields
Query:[location:@"1 Microsoft Way, Redmond, WA, USA" + 5mi]
Results:All items from the 'hotels' category whose location is within 5 miles of "1 Microsoft Way, Redmond, WA, USA"
Query:[price(float USD)>=250.0 USD]
Results:All items from the 'products' category whose price is greater than $250.00 and have 'zune' in one of their fields

Supported Datatypes, Operators and Functions

As can be seen from the previous examples, both the Google Base data API and Astoria support operations on fields searching for string matches. Astoria supports the major SQL datatypes, a list of which can be obtained from the table describing the System.Data.SqlTypes namespace in the .NET Frameworks. The operations that can be performed on the various fields of an entity are the following comparisons

neNot equal
gtGreater than
gteqGreater than or equal
ltLess than
lteqLess than or equal

The list of datatypes supported by Google Base is provided in the Google Base data API documentation topic on Attribute Types. In addition to the comparison operators supported by Astoria, the Google Base data API also supports

@"..." + XmiConvert string in quotes to a geocoded location and match anything that is within a radius of X miles/kilometers/meters around it depending on the unit of distance specified
name(type):X..YTest whether the value of the named attribute [and optional type] falls between X and Y (e.g. [event date range:2007-05-20..2007-05-25] matches all events which fall between both dates)
date-range << date-rangeTest if the date on the right hand side is a subset of the date range on the left hand side
if boolean-expression then expression else expressionWorks like an if...else statement in every programming language you've ever used.

In addition to these operators, it turns out that the Google Base data API also support a full blown expression language for use within predicates. This includes a library of over 20 functions from math functions like sin and cos to aggregation functions like sum and count as well as more esoteric functions like dist, exists and join. Below are some queries which use these operators and functions in action

Query:[item type:vehicles][location(location)]
Results:All vehicles whose listing contain the text 'sale' and orders results by those that are geographically closest to the city of Seattle, WA
Query:[event date range:2007-05-20..2007-05-25]
Results:All events that fall between May 20th 2007 and May 25th 2007


Sorting query results is often necessary when working with large amounts of data. Both Google Base data API and Astoria provide a way to indicate that the results should be sorted based on one or more fields. In Astoria, sorting is done using the $orderby query parameter. For example,

Results:All areas in the Encarta encyclopedia sorted alphabetically by their Name
Query:$orderby=OrderDate desc
Results:All customer orders sorted in descending order by order date
Results:All customer orders sorted by the required date and the cost of freight

The Google Base data API uses the orderby and sortorder query parameters to control sorting and sort order respectively. Examples are shown below

Results:All job listings containing the string 'program manager' sorted by the salary field
Query:[item type:vehicles][location(location)]
Results:All vehicles whose listing contain the text 'sale' and orders results by those that are geographically closest to the city of Seattle, WA
Query:[location:@"1 Microsoft Way, Redmond, WA, USA" + 5mi]&orderby=[x=bedrooms(int): if exists(x) then max(x) else 0]
Results:All items within the 'housing' category that are within 5 miles of Microsoft's headquarters sorted by number of bedrooms. For items that don't have a bedrooms element use the value 0 when sorting


When dealing with large numbers of items, it often isn't feasible to return all of them in a single XML document for a variety of reasons. Both the Google Base data API and Astoria provide mechanisms to retrieve results as multiple "pages".

In Astoria, this is done using a combination of the top and skip query parameters which indicate the number of items to return and what item to start the list from respectively. Examples below

Results:All areas in the Encarta encyclopedia sorted alphabetically by their Name, restricted to only showing 3 items per page
Results:All areas in the Encarta encyclopedia sorted alphabetically by their Name starting from the second item, restricted to only showing 3 items per page

The Google Base data API uses the max-results and start-index query parameters to indicate the number of items to return and what item to start the list from respectively. The default value of max-results is 25 while its maximum value is 250. The total number of results is emitted in the returned feed as the element openSearch:totalResults. Examples below

Results:All hotels within the seattle area, restricted to 10 results per page
Results:All hotels within the seattle area, restricted to 50 results per page starting from the hundredth result

Astoria Specific Features

Using links within items the describe relationships is a core aspect of a RESTful protocol and is utilized by Astoria to show the foreign key relationships between rows/entities in the data base. However it can be cumbersome to have to make multiple requests and follow every link to get all the content related to an item. For this reason, Astoria includes the $expand query parameter which automatically follows the links and retrieves the XML inline. Compare the following queries

Query:[Name eq 'History']/Articles[Title eq 'Civil War, American']
Results:The encyclopedia article on the American Civil War which has links to its Area, ArticleBody, Notes and RelatedArticles
Query:[Name eq 'History']/Articles[Title eq 'Civil War, American']?$expand=ArticleBody,Notes,RelatedArticles
Results:The encyclopedia article on the American Civil War with its Area, ArticleBody and Notes shown inline as XML elements

GData Specific Features

Google Base has a notion of adjusted query results. When this feature is enabled, Google Base will automatically use spelling correction, stemming and other tricks to try and match results. For example, if you perform a search for the value "female" in the gender field of an item, the query adjustment engine will know to also match the value "f" in the gender field of any corresponding items. The query adjustment engine applies its heuristics on queries for field names, field values and item categories. The documentation is contradictory as to whether this feature is enabled by default or has to be specifically enabled by the user of the API.

Another interesting feature, is that the Google Base data API allows one to filter out repetitive results using a feature called "crowding". With this feature, limits can be placed on how many results that match a certain criteria should be returned. See the following examples for details

Results:Return all restuarants stored within Google Base but show no more than 2 per cusine type
NOTE: I actually couldn't get this feature to work using either the example queries from the documentation or queries I constructed. It is quite possible that this feature doesn't work but is so esoteric that no one has noticed.


In comparing both approaches there is a lot to like and dislike. I like the "expand" feature in Astoria as well as the fact that I can retrieve XML results from multiple paths of the hierarchy. However there does seem to be a paucity of operators and functions for better filtering of results.

From the Google Base data API, I love the "crowd" feature and having a full library of functions for performing tests within predicates. Also some of the operators such as the ones for finding results near a certain location are quite impressive although unnecessary for the majority of RESTful protocols out there. That said, I do think they went overboard on some of the features such as having if...else blocks within the URIs. I suspect that some of that complexity wouldn't have been needed if they just had hierarchies instead of a flat namespace that requires complex filtering to get anything out of it.


From the Virtual Earth team's blog post entitled Mobile Search V2 released - Improved Navigation, Cache, Movie Searching, GPS, Traffic reporting and more! we learn

The Mobile Search team has released V2 of the rich client application for Windows Mobile, as well as a major update to the browser based interface. Whether you have a J2ME (Java) phone, Windows Mobile phone, or any other device with a mobile browser, Live Search has you covered with maps, directions and business search.
  • Movie Showtimes:  Want to see a movie but don’t know what’s playing?   Get Movie Showtimes near you and be on your way! 
  • More Local Data with Reviews:  Want to go out for dinner but not sure which restaurant to pick?  Let your fellow restaurant goers help you out – make a decision based on user ratings.
  • Maps:  View Mobile Virtual Earth maps wherever you are.  For improved performance, pop in your storage card to enable the large cache option. Street maps, Aerial and Hybrid are supported.
  • Directions: Lost?  Get found with better support for GPS integration and improved turn-by-turn navigation.  We’ll even prompt you to auto-reroute if you get lost! 
I use this all the time and it is now my favorite Windows Live application. It's a sign that we've come a long way that I now consider printing out driving directions from the Web to be "old school".

I've been using Yahoo's mobile service for movie times but now it looks like I be needing it any more. Sweet. What are you waiting for? Head over to and install it on your phone today.


Categories: Windows Live

July 12, 2007
@ 05:32 PM

Before starting this review I should probably point out that I'm a Transformers fan. Every car I've ever owned has either had an autobot or decepticon insignia on the bumper and I have most of the seasons for the original TV show on DVD. That said, when I first saw the trailers for Transformers I expected the movie to suck because it seemed like there were too many people and not enough robots, boy was I wrong. The movie freaking rocked. The movie had it all; comedy (the kid's parents being typical annoying nosy parents), romance (nerd meets hot girl and impresses her with his cool car), action (soldiers shooting guns, giant robots shooting guns and Optimus Prime vs. Megatron), tragedy (***spoiler deleted***) , product placement (GMC trucks, eBay1), quotes from the cartoon ("Autobots...roll out!" ,"Megatron must be matter cost!", "You've failed me yet again, Starscream") and foreshadowing of the sequel. At two and a half hours I thought the movie would be too long but it didn't seem that way at all. I loved it, my fiancée loved it and so did the kids. Definitely the most fun movie I've seen this summer. 

I saw it on Tuesday evening and the theater was packed. This may have been because of the current heat wave forcing people out of their homes and into any air-conditioned building they could find or it could have been because the favorable reviews have gotten people curious about the movie. I suspect it was a little of both.

Rating: **** out of *****

1Is it me or does it seem like there is a crazy amount of product placement from online services these days? In the past month I've seen eBay featured prominently in movies like 40-year Old Virgin and Transformers. Songs about meeting girls which heavily feature MySpace from mainstream acts like Gym Class Heroes and T-Pain. I've seen Google used repeatedly in movies like The Holiday and Knocked Up...even crazier was when Spider-Man unmasked himself as Peter Parker at a press conference in The Amazing Spider-Man issue #533 then in the very next panel there was Google product placement. The only Microsoft mentions I've seen in pop culture were some teens playing Gears of War in Live Free or Die Hard but the name of the game or the XBox are never mentioned. Our marketing guys have some 'splaining to do.


Categories: Movie Review

In my previous post I mentioned the various problems with relying on incubation teams to bring innovation into a product or organization. The obvious follow up question is that if carving off some subset of your team to work on the "next big thing" while the rest of your employees work on the boring bread and butter product(s) that pay the bills doesn't work, how do you revitalize an organizations products and make them innovative?

My advice is to look at companies within your industry that are considered innovative and see what you can learn from them. One such company is Google which is widely considered to be the most innovative company on Earth by many in the software industry. A number of Google's competitors have several internal groups whose job is to "incubate ideas" and foster innovation yet it seems that Google is the company most associated with innovation in the online space. For example, Yahoo! has Brickhouse and Yahoo! Research while Microsoft has Microsoft Research, Live Labs, Search Labs, and Windows Live Core among others. 

Below are some of the ways technology companies can follow their example without having to resort to some of their more eccentric practices like free food prepared by gourmet chefs and on-site massages, dry cleaning and oil changes to motivate your employees.

  1. Everyone is Responsible for Innovation: There are several ways Google has created a culture where every technical employee feels that innovation is expected of them. First, there is the strong preference for people who have a track record of producing original ideas such as Ph.D's [who are required to produce original research which advances the state of the art as part of their thesis] and founders of Open Source projects (e.g. Spencer Kimball (GIMP), Aaron Boodman (Greasemonkey), and Guido Van Rossum (Python)). Secondly, employees are strongly encouraged but not required to spend 20% of their time on projects of their own design which are intended to benefit the company and/or its customers. Not only does this give employees an outlet for their creativity in a productive way, working on multiple projects at once gives developers a broader world view which makes it less likely that they will develop tunnel vision with regards to their primary project. Finally, Google has a single code base for all of their projects and developers are strongly encouraged to fix bugs or add features to any Google product they want even if they are not on the product team. This attitude encourages the cross pollination of ideas across the company and encourages members of the various product teams to keep an open mind about ideas from outside their particular box.

  2. Good Ideas Often Come from Outside your Box: A lot of people in the software industry often criticize Microsoft for its practice of innovation through acquisition and have compiled lists of Microsoft's innovations that were actually acquisitions but the fact is that the road to success lies in being able to spot good ideas whether they come from within your company or without. Google is no exception to this rule as the following table of acquisitions and the Google products they resulted in shows

    Acquired Company/ProductGoogle Product
    Applied Semantics Google AdSense
    Kaltix Google Personalized Search
    Keyhole Corp. Google Earth
    Where2 Google Maps
    ZipDash Google Ride Finder
    2Web Technologies Google Spreadsheets
    Upstartle Google Docs
    Urchin Software Corporation + Measure Map Google Analytics
    Zenter + Tonic Systems Unreleased Google Web-based Presentation application

    As you can see from the above list, a lot of Google's much lauded products were actually the products of acquisitions as opposed to the results of internal incubation. Being able to conquer the NIH mentality is important if one wants to ensure that the products that exhibit the best ideas are produced by your company because quite often they won't originate from your company.  

  3. Force Competition to Face the Innovator's Dilemma: One reason that a number of Google's products are considered innovative is that they challenge a number of pre-existing notions about software in certain categories. For example, when Gmail [a product of an engineer's 20% time spent on side projects] was first launched it was a shock to see a free Web-based email service qive users 1 gigabyte of free storage. A key reason that this was a shock was because most free Web-based email services gave users less than a hundredth of that amount of storage. This was because the business model for free email was primarily to give users a crappy user experience (2MB of storage, obnoxious advertising, etc) and then charge them for upgrading to a decent experience. Thus there was little incentive for the major players in the free email business to give free users lots of storage or a rich online experience because isn't how the business worked. Another example, that is likely to be a classic case study of the innovator's dilemma in the years to come is Google Docs & Spreadsheets vs. Microsoft Office. From the Wikipedia article on disruptive technology

    In low-end disruption, the disruptor is focused initially on serving the least profitable customer, who is happy with a good enough product. This type of customer is not willing to pay premium for enhancements in product functionality. Once the disruptor has gained foot hold in this customer segment, it seeks to improve its profit margin. To get higher profit margins, the disruptor needs to enter the segment where the customer is willing to pay a little more for higher quality. To ensure this quality in its product, the disruptor needs to innovate. The incumbent will not do much to retain its share in a not so profitable segment, and will move up-market and focus on its more attractive customers. After a number of such encounters, the incumbent is squeezed into smaller markets than it was previously serving. And then finally the disruptive technology meets the demands of the most profitable segment and drives the established company out of the market.

    As someone who now maintains several wedding lists in collaboration with his future spouse I can say without a doubt that universal access to our files from any computer without my fiancèe or I having to install or purchase any software is head and shoulders beyond the solution provided by traditional desktop productivity suites. In addition, it is quite clear that Google will move to address the gaps in time (see Google Gears) so we are likely on the cusp of a multi-billion dollar software category undergo upheaval in the next few years.

    The main lesson here is Change the Game. Do not play by the rules that favor your competitors.

The key thing for people wanting to learn from Google's practices isn't to follow each of Google's specific policies but instead to understand the philosophy behind their practices then apply those philosophies in your specific context.


Marc Andreessen (whose blog is on fire!) has a rather lengthy but excellent blog post entitled The Pmarca Guide to Big Companies, part 2: Retaining great people which has some good advice on how big companies can retain their best employees. The most interesting aspects of his post were some of the accurate observations he had about obviously bad ideas that big companies implement which are intended to retain their best employees but end up backfiring. I thought these insights were valuable enough that they are worth repeating.

Marc writes

Don't create a new group or organization within your company whose job is "innovation". This takes various forms, but it happens reasonably often when a big company gets into product trouble, and it's hugely damaging.

Here's why:

First, you send the terrible message to the rest of the organization that they're not supposed to innovate.

Second, you send the terrible message to the rest of the organization that you think they're the B team.

That's a one-two punch that will seriously screw things up.

This so true. Every time I've seen some executive or management higher up create an incubation or innovation team within a specific product group, it has lead to demoralization of the people who have been relegated as the "B team" and bad blood between both teams which eventually leads to in-fighting. All of this might be worth it if these efforts are successful but as Clayton Christensen pointed out in his interview in Business Week on the tenth anniversary of "The Innovator's Dilemma"

People come up with lots of new ideas, but nothing happens. They get very disillusioned. Never does an idea pop out of a person's head as a completely fleshed-out business plan. It has to go through a process that will get approved and funded. You're not two weeks into the process until you realize, "gosh, the sales force is not going to sell this thing," and you change the economics. Then two weeks later, marketing says they won't support it because it doesn't fit the brand, so we've got to change the whole concept.

All those forces act to make the idea conform to the company's existing business model, not to the marketplace. And that's the rub. So the senior managers today, thirsty for innovation, stand at the outlet of this pipe, see the dribbling out of me-too innovation after me-too innovation, and they scream up to the back end, "Hey, you guys, get more innovative! We need more and better innovative ideas!" But that's not the problem. The problem is this shaping process that conforms all these innovative ideas to the current business model of the company.

This is something I've seen happen time after time. There are times when incubation/innovation teams produce worthwhile results but they are few and far between especially compared to the number of them that exist. In addition, even in those cases both of Marc's observations were still correct and they led to in-fighting between the teams which damaged the overall health of the product, the people and the organization. 

Marc also wrote

Don't do arbitrary large spot bonuses or restricted stock grants to try to give a small number of people huge financial upside.

An example is the Google Founders' Awards program, which Google has largely stopped, and which didn't work anyway.

It sounds like a great idea at the time, but it causes a severe backlash among both the normal people who don't get it (who feel like they're the B team) and the great people who don't get it (who feel like they've been screwed).

Significantly differentiated financial rewards for your "best employees"  are a seductive idea for executives but they rarely work as planned for several reasons. One reason is based on an observation I first saw in Paul Graham's essay Hiring is Obsolete; big companies don't know how to value the contributions of individual employees. Robert Scoble often used to complain in the comments to his blog that he made less than six figures at Microsoft. I personally think he did more for the company's image than the millions we've spent on high priced public relations and advertising firms. Yet it is incredibly difficult to prove this and even if one could the process wouldn't scale to every single employee. Then there's all the research from various corporations that have used social network analysis to find out that their most valuable employees are rarely the ones that are high up in the org chart (see How Org Charts Lie published by the Harvard Business School). The second reason significantly financially rewarding your "best employees" ends up being problematic is well described in Joel Spolsky's article Incentive Pay Considered Harmful where he points out

Most people think that they do pretty good work (even if they don't). It's just a little trick our minds play on us to keep life bearable. So if everybody thinks they do good work, and the reviews are merely correct (which is not very easy to achieve), then most people will be disappointed by their reviews

When you combine the above observation with the act if rewarding does that get good reviews disproportionately from those that just did OK, it can lead to problems. For example, what happens when a company decides that it will give millions of dollars in bonuses to its employees if they "add the most value" to the company? Hey, isn't that what the Google Founder's Awards were supposed to be did that turn out?

The company has continually tinkered with its incentives for people to stay. Early on Page and Brin gave "Founders' Awards" in cash to people who made significant contributions. The handful of employees who pulled off the unusual Dutch auction public offering in August 2004 shared $10 million. The idea was to replicate the windfall rewards of a startup, but it backfired because those who didn't get them felt overlooked. "It ended up pissing way more people off," says one veteran.

Google rarely gives Founders' Awards now, preferring to dole out smaller executive awards, often augmented by in- person visits by Page and Brin. "We are still trying to capture the energy of a startup," says Bock.

Another seductive idea that sounds good on paper which falls apart when you actually add human beings to the equation.


In my previous post, I mentioned that I'm in the early stages of building an application on the Facebook platform. I haven't yet decided on an application but for now, let's assume that it is a Favorite Comic Books application which allows me to store my favorite comic books and shows me to most popular comic books among my friends.

After investigating using Amazon's EC2 + S3 to build my application I've decided that I'm better off using a traditional hosting solution running either a on the LAMP or WISC platform. One of the things I've been looking at is which platform has better support for providing an in-memory caching solution that works well in the context of a Web farm (i.e. multiple Web servers) out of the box. While working on the platforms behind several high traffic Windows Live services I've learned  that you should be prepared for dealing with scalability issues and caching is one of the best ways to get bang for the buck when improving the scalability of your service.

I recently discovered memcached which is a distributed, object caching system originally developed by Brad Fitzpatrick of LiveJournal fame. You can think of memcached as a giant hash table that can run on multiple servers which automatically handles maintaining the balance of objects hashed to each server and transparently fetches/removes objects from over the network if they aren't on the same machine that is accessing an object in the hash table. Although this sounds fairly simple, there is a lot of grunt work in building a distributed object cache which handles data partitioning across multiple servers and hides the distributed nature of the application from the developer. memcached is a well integrated into the typical LAMP stack and is used by a surprising number of high traffic websites including Slashdot, Facebook, Digg, Flickr and Wikipedia. Below is what C# code that utilizes memcached would look like sans exception handling code

public ArrayList GetFriends(int user_id){

    ArrayList friends = (ArrayList) myCache.Get("friendslist:" + userid);

    if(friends == null){
        // Open the connection

        SqlCommand cmd = new SqlCommand("select friend_id from friends_list where owner_id=" + "user_id", dbConnection);

        SqlDataReader reader = cmd.ExecuteReader();

        // Add each friend ID to the list
        while (reader.Read()){


        myCache.Set("friendslist:" + userid, friends);

    return friends;

public void AddFriend(int user_id, int new_friends_id){

    // Open the connection

    SqlCommand cmd = new SqlCommand("insert into friends_list (owner_id, friend_id) values (" + user_id + "," + new_friend_id ")";

    //remove key from cache since friends list has been updated
    myCache.Delete("friendslist:" + userid);

    dbConnection .Close(); 

The benefits of the using of the cache should be pretty obvious. I no longer need to hit the database after the first request to retrieve the user's friend list which means faster performance in servicing the request and less I/O.  The memcached automatically handles purging items out of the cache when it hits the size limit and also deciding which cache servers should hold individual key<->value pairs.

I hang with a number of Web developers on the WISC platform and I don't think I've ever heard anyone mention memcached or anything like it.In fact I couldn't find a mention of it on Microsoft employee blogs, ASP.NET developer blogs or on MSDN. So I wondered what the average WISC developer uses as their in-memory caching solution.

After looking around a bit, I came to the conclusion that most WISC developers use the built-in ASP.NET caching features. ASP.NET provides a number of in-memory caching features including a Cache class which provides a similar API to memcached, page directives for caching portions of the page or the entire page and the ability to create dependencies between cached objects and the files or database tables/rows that they were populated from via the CacheDependency and SqlCacheDependency classes. Although some of these features are also available in various Open Source web development frameworks such as Ruby on Rails + memcached, none give as much functionality out of the box as ASP.NET or so it seems.

Below is what the code for the GetFriends and AddFriend methods would look like using the built-in ASP.NET caching features

public ArrayList GetFriends(int user_id){

    ArrayList friends = (ArrayList) Cache.Get("friendslist:" + userid);

    if(friends == null){
        // Open the connection

        SqlCommand cmd = new SqlCommand("select friend_id from friends_list where owner_id=" + "user_id", dbConnection);

        SqlCacheDependency dependency = new SqlCacheDependency(cmd);
        SqlDataReader reader = cmd.ExecuteReader();

        // Add each friend ID to the list
        while (reader.Read()){


        //insert friends list into cache with associated dependency
        Cache.Insert("friendslist:" + userid, friends, dependency);
    return friends;

public void AddFriend(int user_id, int new_friends_id){
    // Open the connection

    SqlCommand cmd = new SqlCommand("insert into friends_list (owner_id, friend_id) values (" + user_id + "," + new_friend_id ")";

    /* no need to remove from cache because SqlCacheDependency takes care of that automatically */
    // Cache.Remove("friendslist:" + userid);

    dbConnection .Close();

Using the SqlCacheDependency class gets around a significant limitation of the ASP.NET Cache class. Specifically, the cache is not distributed. This means that if you have multiple Web front ends, you'd have to write your own code to handle partitioning data and invalidating caches across your various Web server instances. In fact, there are numerous articles showing how to implement such a solution including Synchronizing the ASP.NET Cache across AppDomains and Web Farms by Peter Bromberg and Use Data Caching Techniques to Boost Performance and Ensure Synchronization by David Burgett.

However, let's consider how how SqlCacheDependency is implemented. If you are using SQL Server 7 or SQL Server 2000, then your ASP.NET process polls the database at regular intervals to determine whether the target(s) of the original query have changed. For SQL Server 2005, the database can be configured to send change notifications to the Web servers if the target(s) of the original query change. Either way, the database is doing work to determine if the data has changed. Compared to the memcached this still doesn't seem as efficient as we can get if we want to eke out every last out of performance out of the system although it does lead to simpler code.

If you are a developer on the WISC platform and are concerned about getting the best performance out of your Web site, you should take a look at memcached for Win32. The most highly trafficked site on the WISC platform is probably MySpace and in articles about how they are platform works such as Inside they extol the virtues of moving work out of the database and relying on cache servers.


Categories: Platforms | Programming | Web Development

In my efforts to learn more about Web development and what it is like for startups adopting Web platforms I've decided to build an application on the Facebook platform. I haven't yet decided on an application but for the sake of argument let's say it is a Favorite Comic Books application which allows me to store my favorite comic books and shows me to most popular comic books among my friends.

The platform requirements for the application seems pretty straightforward. I'll need a database and some RESTful Web services which provide access to the database from the widget which can be written in my language of choice. I'll also need to write the widget in FBML which will likely mean I'll have to host images and CSS files as well. So far nothing seems particularly esoteric. 

Since I didn't want my little experiment eventually costing me a lot of money, I thought this was an excellent time to try out Amazon's Simple Storage Service (S3) and Elastic Compute Cloud (EC2) services since I'll only pay for as much resources as I use instead of paying a flat hosting fee..

However it seems supporting this fairly straightforward application is beyond the current capabilities of EC2 + S3. S3 is primarily geared towards file storage so although it makes a good choice for cheaply hosting images and CSS stylesheets, it's a not a good choice for storing relational or structured data. If it was just searching within a single user's data ( e.g. just searching within my favorite comics) I could store it all in single XML file then use XPath to find what I was looking for. However my application will need to perform aggregated queries across multiple user's data (i.e. looking at the favorite comics of all of my friends then fetching the most popular ones) so a file based solution isn't a good fit. I really want a relational database.

EC2 seemed really promising because I could create a virtual server running in Amazon's cloud and load it up with my choice of operating system, database and Web development tools. Unfortunately, there was a fly in the ointment. There is no persistent storage in EC2 so if your virtual server goes down for any reason such as taking it down to install security patches or a system crash, all your data is lost.

This is a well known problem within the EC2 community which has resulted in a bunch of clever hacks being proposed by a number of parties. In his post entitled Amazon EC2, MySQL, Amazon S3 Jeff Barr of Amazon writes

I was on a conference call yesterday and the topic of ways to store persistent data when using Amazon EC2 came up a couple of times. It would be really cool to have a persistent instance of a relational database like MySQL but there's nothing like that around at the moment. An instance can have a copy of MySQL installed and can store as much data as it would like (subject to the 160GB size limit for the virtual disk drive) but there's no way to ensure that the data is backed up in case the instance terminates without warning.

Or is there?

It is fairly easy to configure multiple instances of MySQL in a number of master-slave, master-master, and other topologies. The master instances produce a transaction log each time a change is made to a database record. The slaves or co-masters keep an open connection to the master, reading the changes as they are logged and mimicing the change on the local copy. There can be some replication delay for various reasons, but the slaves have all of the information needed to maintain exact copies of the database tables on the master.

Besides the added complexity this places on the application, it still isn't fool proof as is pointed out in the various comments in response to Jeff's post.

Demitrious Kelly who also realizes the problems with relying on replication to solve the persistence problem proposed an alternate solution in his post MySQL on Amazon EC2 (my thoughts) where he writes

Step #2: I’m half the instance I used to be! With each AMI you get 160GB of (mutable) disk space, and almost 2GB of ram, and the equivalent of a Xeon 1.75Ghz processor. Now divide that, roughly, in half. You’ve done that little math exercise because your one AMI is going to act as 2 AMI's. Thats right. I’m recommending running two separate instances of MySQL on the single server.

Before you start shouting at the heretic, hear me out!

+-----------+   +-----------+
| Server A | | Server B |
+-----------+ +-----------+
| My | My | | My | My |
| sQ | sQ | | sQ | sQ |
| l | l | | l | l |
| | | | | |
| #2<=== #1 <===> #1 ===>#2 |
| | | | | |
+ - - - - - + + - - - - - +

On each of our servers, MySQL #1 and #2 both occupy a max of 70Gb of space. The MySQL #1 instances of all the servers are setup in a master-master topography. And the #2 instance is setup as a slave only of the #1 instance on the same server. so on server A MySQL #2 is a copy (one way) of #1 on server A.

With the above setup *if* server B were to get restarted for some reason you could: A) shut down the MySQL instance #2 on server A. Copy that MySQL #2 over to Both slots on server B. Bring up #1 on server B (there should be no need to reconfigure its replication relationship because #2 pointed at #1 on server A already). Bring up #2 on server B, and reconfigure replication to pull from #1 on ServerB. This whole time #1 on Server A never went down. Your services were never disrupted.

Also with the setup above it is possible (and advised) to regularly shut down #2 and copy it into S3. This gives you one more layer of fault tollerance (and, I might add, the ability to backup without going down.)

Both solutions are fairly complicated, error prone and still don't give you as much reliability as you would get if you simply had a hard disk that didn't lose all its data when you rebooted the server goes down. At this point it is clear that a traditional hosted service solution is the route to go. Any good suggestions for server-side LAMP or WISC hosting that won't cost an arm and a leg? Is Joyent any good?

PS: It is clear this is a significant problem for Amazon's grid computing play and one that has to be fixed if the company is serious about getting into the grid computing game and providing a viable alternative to startups looking for a platform to build the next "Web 2.0" hit. Building a large scale, distributed, relational database then making it available to developers as a platform is unprecedented so they have their work cut out for them. I'd incorrectly assumed that BigTable was the precedent for this but I've since learned that BigTable is more like a large scale, distributed, spreadsheet table as opposed to a relational database. This explains a lot of the characteristics of the query API of Google Base.


Categories: Web Development

None of these was worth an entire post.

  1. Universal Music Group Refuses to Renew Apple's Annual License to Sell Their Music on iTunes: So this is what it looks like when an industry that has existed for decades begins to die. I wonder who's going to lose out more? Apple because people some people stop buying iPods because they can't buy music from Jay-Z and Eminem on iTunes or Universal Music Group for closing itself out of the biggest digital music marketplace in the world in the midst of declining CD sales worldwide. It's as if the record labels are determined to make themselves irrelevant by any means necessary. 

  2. Standard URLs - Proposal for a Web with Less Search: Wouldn't it be cool if every website in the world used the exact same URL structure based on some ghetto reimplementation of the Dewey Decimal System? That way I could always type or  to find the Harry Potter book on whatever book website I was on instead of typing "harry potter goblet of fire" into a search box. Seriously.

    This is the kind of idea that makes sense when you are kicking it with your homeboys late at night drinking 40s and smokin' blunts but ends up making you scratch your head in the morning when you sober up, wondering how you could have ever come up with such a ludicrous idea.

  3. Facebook has 'thrown the entire startup world for a loop': This post is by a startup developer complaining that Facebook has placed limits on usage of their APIs which prevent Facebook widgets from spamming a user's friends when the user adds the widget to their profile. What does he expect? That Facebook should make it easier for applications to spam their users? WTF? Go read Mike Torres's post Facebook weirdness then come back and explain to me why the folks at Facebook should be making it easier for applications to send spam on a user's behalf in the name of encouraging the "viral growth of apps".

  4. Does negative press make you Sicko? Google ad sales rep makes impassioned pitch to big Pharmaceutical companies and HMOs to counter the negative attention from Michael Moore's Sicko by buying Google search ads and getting Google to create "Get the Facts" campaigns for them. I guess all that stuff Adam Bosworth said about Google wanting to help create better educated patients doesn't count since patients don't buy ads. ;) Talk about making your employer look like an unscrupulous, money grubbing whore. Especially 

    Do no evil. It's now Search, Ads and Apps

  5. People Who Got in Line for an iPhone: I was at the AT&T store on the day of the iPhone launch to pick up a USB cable for my fianc´e. It took me less than ten minutes to deal with the line at around 8:00PM and they still had lots of iPhones. It seems people had waited hours in line that day and I could have picked one up with just ten minutes of waiting on launch day if I wanted one. I bet if you came on Saturday the lines were even shorter and by today you could walk in. Of course, this is assuming you are crazy enough to buy a v1 iPhone in the first place.