These are my notes from the keynote session MapReduce, BigTable, and Other Distributed System Abstractions for Handling Large Datasets by Jeff Dean.

The talk was about the three pillars of Google's data storage and processing platform; GFS, BigTable and MapReduce.

GFS

The developers at Google decided to build their own custom distributed file system because they felt that they had unique requirements. These requirements included

  • scalable to thousands of network nodes
  • massive read/write bandwidth requirements
  • ability to handle large blocks of data which are gigabytes in size.
  • need exremely efficient distribution of operations across nodes to reduce bottlenecks

One benefit the developers of GFS had was that since it was an in-house application they could control the environment, the client applications and the libraries a lot better than in the off-the-shelf case.

GFS Server Architecture

There are two server types in the GFS system.

Master servers
These keep the metadata on the various data files (in 64MB chunks) within the file system. Client applications talk to the master servers to perform metadata operations on files or to locate the actual chunk server that contains the actual bits on disk.
Chunk servers
These contain the actual bits on disk and can be considered to be dumb file servers. Each chunk is replicated across three different chunk servers to create redundancy in case of server crashes. Client applications retrieve data files directly from chunk servers once they've been directed to the chunk server which contains the chunk they want by a master server.

There are currently over 200 GFS clusters at Google, some of which have over 5000 machines. They now have pools of tens of thousands of machines retrieving data from GFS clusters that run as large as 5 petabytes of storage with read/write throughput of over 40 gigabytes/second across the cluster.

MapReduce

At Google they do a lot of processing of very large amounts of data. In the old days, developers would have to write their own code to partition the large data sets, checkpoint code and save intermediate results, handle failover in case of server crashes, and so on as well as actually writing the business logic for the actual data processing they wanted to do which could have been something straightforward like counting the occurence of words in various Web pages or grouping documents by content checksums. The decision was made to reduce the duplication of effort and complexity of performing data processing tasks by building a platform technology that everyone at Google could use which handled all the generic tasks of working on very large data sets. So MapReduce was born.

MapReduce is an application programming interface for processing very large data sets. Application developers feed in a key/value pair (e.g. {URL,HTML content} pair) then use the map function to extract relevant information from each record which should produce a set of intermediate key/value pairs (e.g. {word, 1 } pairs for each time a word is encountered) and finally the reduce function merges the intermediate values associated with the same key to produce the final output (e.g. {word, total count of occurences} pairs).

A developer only has to write their specific map and reduce operations for their data sets which could run as low as 25 - 50 lines of code while the MapReduce infrastructure deals with parallelizing the task and distributing it across different machines, handling machine failures and error conditions in the data, optimizations such as moving computation close to the data to reduce I/O bandwidth consumed, providing system monitoring and making the service scalable across hundreds to thousands of machines.

Currently, almost every major product at Google uses MapReduce in some way. There are 6000 MapReduce applications checked into the Google source tree with the hundreds of new applications that utilize it being written per month. To illustrate its ease of use, a graph of new MapReduce applications checked into the Google source tree over time shows that there is a spike every summer as interns show up and create a flood of new MapReduce applications that are then checked into the Google source tree.

MapReduce Server Architecture

There are three server types in the MapReduce system.

Master server
This assigns user tasks to map and reduce servers as well as keeps track of the state of these tasks.
Map Servers
Accepts user input and performs map operation on them then writes the results to intermediate files
Reduce Servers
Accepts intermediate files produced by map servers and performs reduce operation on them.

One of the main issues they have to deal with in the MapReduce system is problem of stragglers. Stragglers are servers that run slower than expected for one reason or the other. Sometimes stragglers may be due to hardware issues (e.g. bad harddrive conttroller causes reduced I/O throughput) or may just be from the server running too many complex jobs which utilize too much CPU. To counter the effects of stragglers, they now assign multiple servers the same jobs which counterintuitively ends making tasks finish quicker. Another clever optimization is that all data transferred between map and reduce servers is compressed since the servers usually aren't CPU bound so compression/decompression costs are a small price to pay for bandwidth and I/O savings.

BigTable

After the creation of GFS, the need for structured and semi-structured storage that went beyond opaque files became clear. Examples of situations that could benefit from this included

  • associating metadata with a URL such as when it was crawled, its PageRank™, contents, links to it, etc
  • associating data with a user such as the user's search history and preferences
  • geographical data such as information about roads and sattelite imagery

The system required would need to be able to scale to storing billions of URLs, hundreds of terabytes of satellite imagery, data associated preferences with hundreds of millions of users and more. It was immediately obvious that this wasn't a task for an off-the-shelf commercial database system due to the scale requirements and the fact that such a system would be prohibitively expensive even if it did exist. In addition, an off-the-shelf system would not be able to make optimizations based on the underlying GFS file system. Thus BigTable was born.

BigTable is not a relational database. It does not support joins nor does it support rich SQL-like queries. Instead it is more like a multi-level map data structure. It is a large scale, fault tolerant, self managing system with terabytes of memory and petabytes of storage space which can handle millions of reads/writes per second. BigTable is now used by over sixty Google products and projects as the platform for storing and retrieving structured data.

The BigTable data model is fairly straightforward, each data item is stored in a cell which can be accessed using its {row key, column key, timestamp}. The need for a timestamp came about because it was discovered that many Google services store and compare the same data over time (e.g. HTML content for a URL). The data for each row is stored in one or more tablets which are actually a sequence of 64KB blocks in a data format called SSTable.

BigTable Server Architecture

There are three primary server types of interest in the BigTable system.

Master servers
Assigns tablets to tablet servers, keeps track of where tablets are located and redistributes tasks as needed.
Tablet servers
Handle read/write requests for tablets and split tablets when they exceed size limits (usually 100MB - 200MB). If a tablet server fails, then a 100 tablet servers each pickup 1 new tablet and the system recovers.
Lock servers
These are instances of the Chubby distributed lock service. Lots of actions within BigTable require acquisition of locks including opening tablets for writing, ensuring that there is no more than one active Master at a time, and access control checking.

There are a number of optimizations which applications can take advantage of in BigTable. One example is the concept of locality groups. For example, some of the simple metadata associated with a particular URL which is typically accessed together (e.g. language, PageRank™ , etc) can be physically stored together by placing them in a locality group while other columns (e.g. content) are in a separate locality group. In addition, tablets are usually kept in memory until the machine is running out of memory before their data is written to GFS as an SSTable and a new in memory table is created. This process is called compaction. There are other types of compactions where in memory tables are merged with SSTables on disk to create an entirely new SSTable which is then stored in GFS.

Current Challenges Facing Google's Infrastructure

Although Google's infrastructure works well at the single cluster level, there are a number of areas with room for improvement including

  • support for geo-distributed clusters
  • single global namespace for all data since currently data is segregated by cluster
  • more and better automated migration of data and computation
  • lots of consistency issues when you couple wide area replication with network partitioning (e.g. keeping services up even if a cluster goes offline for maintenance or due to some sort of outage).

Recruiting Sales Pitch

[The conference was part recruiting event so some of the speakers ended their talks with a recruiting spiel - Dare]

Having access to lots of data and computing power is a geek playground. You can build cool, seemingly trivial apps on top of the data such which turn out to be really useful such as Google Trends and catching misspellings of "britney spears. Another example of the kinds of apps you can build when you have enough data treating the problem of language translation as a statistical modeling problem which turns out to be one of the most successful methods around.

Google hires smart people and lets them work in small teams of 3 to 5 people. They can get away with teams being that small because they have the benefit of an infrastructure that takes care of all the hard problems so devs can focus on building interesting, innovative apps.


 

Categories: Platforms | Trip Report

June 23, 2007
@ 03:32 PM

THEN: The PayPerPost Virus Spreads

Two new services that are similar to the controversial PayPerPost have announced their launch in the last few days: ReviewMe and CreamAid. PayPerPost, a marketplace for advertisers to pay bloggers to write about products (with our without disclosure), recently gained additional attention when they announced a $3 million round of venture financing.

The PayPerPost model brings up memories of payola in the music industry, something the FCC and state attorney generals are still trying to eliminate or control. Given the distributed and unlicensed nature of the blogosphere, controlling payoffs to bloggers will be exponentially more difficult.

Our position on these pay-to-shill services is clear: they are a natural result of the growth in size and influence of the blogosphere, but they undermine the credibility of the entire ecosystem and mislead readers.

NOW: I’m shocked, shocked to find that gambling is going on in here!

The title, which is a quote from the movie casablanca, is what came to mind tonight when I read the complete train wreck occuring on TechMeme over advertisements that contain a written message from the publisher. The whole thing was started by Valleywag of course.

The ads in question are a staple of FM Publishing - a standard ad unit contains a quote by the publisher saying something about something. It isn’t a direct endorsement. Rather, it’s usually an answer to some lame slogan created by the adveriser. It makes the ad more personal and has a higher click through rate, or so we’ve been told. In the case of the Microsoft ad, we were quoted how we had become “people ready,” whatever that means. See our answer and some of the others here (I think it will be hard to find this text controversial, or anything other then extremely boring). We do these all the time…generally FM suggests some language and we approve or tweak it to make it less lame. The ads go up, we get paid. This has been going on for months and months - at least since the summer of 2006. It’s nothing new. It’s text in an ad box. I think people are pretty aware of what that means…which is nothing.

Any questions?


 

Categories: Current Affairs

I was reading reddit this morning and spotted a reference to the Microsoft Popfly team's group picture which pointed out that from reading the job titles in the pic there were 9 managers and 5 developers on the product team. The list of people in the picture and their titles from the picture are excerpted below

From left to right: John Montgomery (Group Program Manager), Andy Sterland (Program Manager), Alpesh Gaglani (Developer), Tim Rice (Developer), Suzanne Hansen (Program Manager), Steven Wilssens (Program Manager), Vinay Deo (Engineering Manager), Michael Leonard (Test Developer), Jianchun Xu (Developer), Dan Fernandez (Product Manager), Adam Nathan (Developer), Wes Hutchins (Program Manager), Aaron Brethorst (Program Manager), Paramesh Vaidyanathan (Product Unit Manager), and Murali Potluri (Developer).

A Microsoft employee followed up the reddit link with a comment pointing out that it is actually 5 dev, 5 PM, 1 test, 3 managers and 1 marketing. This sounds a lot better but I still find it interesting that there is a 1:1 ratio of Program Managers (i.e. design features/APIs, write specs, call meetings) to Developer (i.e. write code, fix bugs, ignore PMs). Although this ratio isn't unusual for Microsoft this has always struck me as rather high. I've always felt that a decent ratio of PMs to developers is more like 1:2 or higher. And I've seen some claim ratios like 1 PM to 5 developers for Agile projects but haven't been able to find much about industry averages online. It seems must discussion about staffing ratios on software projects focus on Developer to Test ratios and even then the conclusion on is it depends. I think the PM to Developer ratio question is more clear cut.

What are good ratios that have worked for you in the past and what would you consider to be a bad ratio?

PS: A note underneath the group picture mentions that some folks on the team aren't pictured but I looked them up and they are in marketing so they aren't relevant to this discussion.


 

Categories: Programming

  1. XKCD: Pickup Lines: "If I could rearrange the alphabet..."

  2. Chart: Chances of a Man Winning an Argument plotted over Time: I'm in the middle period. :)

  3. Fake Steve Jobs: Microsoft Goes Pussy: "We've integrated search into our OS too. It makes sense. And Microsoft's search stuff in Vista is really good (God I just threw up in my mouth when I wrote that)..."

  4. Chris Kelly: Das Capital One: "Back before Capital One, there were just two kinds of consumers: People who could afford credit cards and people who couldn't afford credit cards...The guy who started Capital One imagined a third kind of person - someone who could almost afford a credit card. A virtual credit card holder. Something between a good risk and a social parasite."

  5. I CAN HAS CHEEZBURGER?: OH HAI GOOGLZ: Google Street View + lolcats = Comedic Gold

  6. Bileblog: Google Code - Ugliness is not just skin deep: "The administrative menu is, to put it as kindly as possible, whimsical.Menu items and options are scattered about like goat pebbleturds on amountain. The only option under ‘Advanced’ is ‘Delete this project’.How is that advanced functionality?"

  7. Wikipedia: Pokémon test: "Each of the 493 Pokémon has its own page, all of which are bigger than stubs. While it would be expected that Pikachu would have its own page, some might be surprised to find out that Stantler has its own page, as well. Some people perceive Pokémon as something 'for little kids' and argue that if that gets an article, so should their favorite hobby/band/made-up word/whatever."

  8. YouTube: A Cialis Ad With Cuba Gooding Jr.: From the folks at NationalBanana, lots of funny content on their site.

  9. Bumper Sticker: Hell Was Full: Saw this on my way to work.

  10. YouTube: Microsoft Surface Parody - "The future is here and it's not an iPhone. It's a big @$$ table. Take that Apple"


 

According to my Feedburner stats it seems I lost about 214 subscribers using Google Reader between Saturday June 16th and Sunday June 17th. This seems like a fairly significant number of readers to unsubscribe from my blog on a weekend especially since I don't think I posted anything particularly controversial relative to my regular posts.

I was wondering if any other Feedburner users noticed a similar dip in their subscribers numbers via Google Reader over the weekend or whether it's just a coincidence that I happened to lose so many regular readers at once?


 

Categories: Personal

In his post Implementing Silverlight in 21 Days Miguel De Icaza writes

The past 21 days have been some of the most intense hacking days that I have ever had and the same goes for my team that worked 12 to 16 hours per day every single day --including weekends-- to implement Silverlight for Linux in record time. We call this effort Moonlight.

Needless to say, we believe that Silverlight is a fantastic development platform, and its .NET-based version is incredibly interesting and as Linux/Unix users we wanted to both get access to content produced with it and to use Linux as our developer platform for Silverlight-powered web sites.

His post is a great read for anyone who geeks out over phenomenal feats of hackery. Going over the Moonlight Project Page it's interesting to note how useful blog posts from Microsoft employees were in getting Miguel's team to figure out the internal workings of Silverlight.

In addition, it seems Miguel also learned a lot from hanging out with Jason Zander and Scott Guthrie which influenced some of the design of Moonlight. It's good to see Open Source developers working on Linux having such an amicable relationship with Microsoft developers.

Congratulations to Mono team, it looks like we will have Silverlight on Linux after all. Sweet.


 

Categories: Platforms | Programming | Web Development

A couple of years ago, I wrote a blog post entitled Social Software: Finding Beauty in Walled Gardens where I riffed on the benefits of being able to tell the software applications you use regularly "here are the people I know, these are the ones I trust, etc". At the time I assumed that it would be one of the big Web companies such as Google, Yahoo!, or Microsoft that would build the killer social software platform that was powered by this unified view of your social connections. I was wrong. Facebook has beaten everyone to doing it first. There are a lot of user scenarios on the Web that can be improved if the applications we were using know who our friends, family and co-workers were knew without us having to explicitly tell them. Below are a couple of online services where access to a user's social network has made Facebook better at performing certain Web tasks than the traditional market leaders. 

NOTE: This started off as three different blog posts in my writing queue but after reading ridiculous overhype like ex-Google employees publicly decamping from their former employer because 'Facebook is the Google of yesterday, the Microsoft of long ago' I decided to scale back my writing about the service and merge all my thoughts into a single post. 

Displacing Email for Personal Communication

Gervase Markham, an employee of the Mozilla Foundation, recently wrote in his blog post entitled The Proprietarisation of Email

However, I also think we need to be aware of current attempts to make email closed and proprietary.

What am I talking about, I hear you ask? No-one's resurrected the idea of a spam-free email walled garden recently. Companies who tout their own secure mail protocols come and go and no-one notes their passing. The volume of legitimate email sent continues to grow. What's the worry?

I'm talking about the messaging systems built into sites like Facebook and LinkedIn. On several occasions recently, friends have chosen to get back in touch with me via one of these rather than by email. Another friend recently finished a conversation with a third party by saying "Facebook me"; when I asked her why she didn't just use email, she said "Oh, Facebook is so much easier".

And she's right. There's no spam, no risk of viruses or phishing, and you have a ready-made address book that you don't have to maintain. You can even do common mass email types like "Everyone, come to this event" using a much richer interface. Or other people can see what you say if you "write on their wall". In that light, the facts that the compose interface sucks even more than normal webmail, and that you don't have export access to a store of your own messages, don't seem quite so important.

After I read this post, I reflected on my casual use of the user to user messaging feature on Facebook and realized that even though I've only used it a handful of times, I've used it to communicate with friends and family a lot more than I have used either of my personal email addresses in the past three months. In fact, there are a bunch of friends and family whose email addresses I don't know that I've only communicated with online through Facebook. That's pretty wild. The fact that I don't get spam or random messages from people I don't know is also a nice plus and something a lot of other social network sites could learn from.

So one consequence of Facebook being used heavily by people in my real-life social network is that it is now more likely to be my interface for communicating with people I know personally than email. I suspect that if they ever add an instant messaging component to the site, it could significantly change the demographics of the top instant messaging applications.

Changing the Nature of Software Discovery and Distribution

I wrote about this yesterday in my post Marc Andreessen: The GoDaddy 2.0 Business Model but I think the ramifications of this are significant enough that it bears repeating. The viral software distribution model is probably one of the biggest innovations in the Facebook platform. Whenever my friends add an application to their dashboard I get a message in my news feed informing with a link to try out the application. I've tried out a couple of applications this way and it seems like a very novel and viral way to distribute applications. For one thing, it definitely a better way to discover new Facebook applications than browsing the application directory. Secondly, it also means that the best software is found a lot more quickly. The iLike folks have a blog post entitled Holy cow... 6mm users and growing 300k/day! they show a graph that indicates that iLike on Facebook has grown faster in its first few weeks than a number of popular services that grew quite quickly in their day including Skype, Hotmail, Kazaa and ICQ. 6 million new users in less than a month? 300,000 new users a day? Wow.

Although there are a number of issues to work out before transferring this idea to other contexts, I believe that this is a very compelling to approach how new software is discovered and distributed.  I would love it if I my friends and family got a notification whenever I discovered a useful Firefox add-on or a great Sidebar gadget and vice versa. I wouldn't be surprised if this concept starts showing up in other places very soon.  

Facebook Marketplace: A Craigslist Killer

Recently my former apartment was put up for rent after I broke the lease as part of the process of moving into a house. I had assumed that it would be listed in the local paper and apartment finding sites like Apartments.com. However I was surprised to find out from the property manager that they only listed apartments on Craig's List because it wasn't worth it to list anywhere else anymore. It seemed that somewhere along the line, the critical mass of apartment hunters had moved to using Craig's List for finding apartments instead of the local paper.

Since then I've used Craig's List and I was very dissatisfied with the experience. Besides the prehistoric user interface, I had to kiss a lot of frogs before finding a prince. I called about a ten people based on their listing and could only reach about half of them. Of those one person said he'd call back and didn't, another said he'd deliver to my place and then switched off his phone after I called to ask why he was late (eventually never showed) while yet another promised to show up then called back to cancel because his wife didn't want him leaving the house on a weekend. I guess it should be unsurprising how untrustworthy and flaky a bunch of the people listing goods and services for sale on Craig's List are since it doesn't cost anything to create a listing.

Now imagine if I could get goods and services only from people I know, people they know or from a somewhat trusted circle (e.g. people who work for the same employer) wouldn't that lead to a better user experience than what I had to deal with on Craig's List? In fact, this was the motivation behind Microsoft's Windows Live Expo which is billed as a "social marketplace". However the problem with marketplaces is that you need a critical mass of buyers and sellers for them to thrive. Enter Facebook Marketplace

Of course, this isn't a slam dunk for Facebook and in fact right now there are ten times as many items listed for sale in the Seattle area on Windows Live Expo than on Facebook Marketplace (5210 vs. 520). Even more interesting is that a number of listings on Facebook Marketplace actually link back to listings on Craig's List which implies that people aren't taking it seriously as a listing service yet.

Conclusion

From the above, it is clear that there is a lot of opportunity for Facebook to dominate and change a number of online markets beyond just social networking sites. However it is not a done deal. The company is estimated to have about 200 employees and that isn't a lot in the grand scheme of things. There is already evidence that they have been overwhelmed by the response to their platform when you see some of the complaints from developers about insufficient support and poor platform documentation.

In addition, it seems the core Facebook application is not seeing enough attention when you consider that there is some fairly obvious functionality that doesn't exist. Specifically, it is quite surprising that Facebook doesn't take advantage of the wisdom of the crowds for powering local recommendations. If I was a college freshman, new employee or some other recently transplanted individual it would be cool for me to plug into my social network to find out where to go for the best chinese food, pizza, or nightclubs in the area.

I suspect that the speculation on blogs like Paul Kedrosky's is right and we'll see Facebook try to raise a bunch of money to fuel growth within the next 12 - 18 months. To reach its full potential, the company needs a lot more resources than it currently has.


 

I read Marc Andreessen's Analyzing the Facebook Platform, three weeks in and although there's a lot to agree with I was also confused by how he defined a platform in the Web era. After a while, it occurred to me that Marc Andreessen's Ning is part of an increased interest by a number of Web players in chasing after what I like to call the the GoDaddy 2.0 business model. Specifically, a surprisingly disparate group of companies seem to think that the business of providing people with domain names and free Web hosting so they can create their own "Web 2.0" service is interesting. None of these companies have actually come out and said it but whenever I think of Ning or Amazon's S3+EC2, I see a bunch of services that seem to be diving backwards into the Web hosting business dominated by companies like GoDaddy.

Reading Marc's post, I realized that I didn't think of the facilities that a Web hosting provider gives you as "a platform". When I think of a Web platform, I think of an existing online service that enables developers to either harness the capabilities of that service or access its users in a way that allows the developers add value to user experience. The Facebook platform is definitely in this category.  On the other hand, the building blocks that it takes to actually build a successful online service including servers, bandwidth and software building blocks (LAMP/RoR/etc) can also be considered a platform. This is where Ning and Amazon's S3+EC2. With that context, let's look at the parts of Marc's Analyzing the Facebook Platform, three weeks in which moved me to write this. Marc writes 

Third, there are three very powerful potential aspects of being a platform in the web era that Facebook does not embrace.

The first is that Facebook itself is not reprogrammable -- Facebook's own code and functionality remains closed and proprietary. You can layer new code and functionality on top of what Facebook's own programmers have built, but you cannot change the Facebook system itself at any level.

There doesn't seem to be anything fundamentally wrong with this. When you look at some of the most popular platforms in the history of the software industry such as Microsoft Windows, Microsoft Office, Mozilla Firefox or Java you notice that none of them allow applications built on them to fundamentally change what they are. This isn't the goal of an application platform. More specifically, when one considers platforms that were already applications such as Microsoft Office or Mozilla Firefox it is clear that the level of extensibility allowed is intended to allow improving the user experience while utilizing the application and thus make the platform more sticky as opposed to reprogramming the core functionality of the application.

The second is that all third-party code that uses the Facebook APIs has to run on third-party servers -- servers that you, as the developer provide. On the one hand, this is obviously fair and reasonable, given the value that Facebook developers are getting. On the other hand, this is a much higher hurdle for development than if code could be uploaded and run directly within the Facebook environment -- on the Facebook servers.

This is one unfortunate aspect of Web development which tends to harm hobbyists. Although it is possible for me to find the tools to create and distribute desktop applications for little or no cost, the same cannot be said about Web software. At the very least I need a public Web server and the ability to pay for the hosting bills if my service gets popular. This is one of the reasons I can create and distribute RSS Bandit to thousands of users as a part time project with no cost to me except my time but cannot say the same if I wanted to build something like Google Reader.

This is a significant barrier to adoption of certain Web platforms which is a deal breaker for many developers who potentially could add a lot of value. Unfortunately, building an infrastructure that allows you to run arbitrary code from random Web developers and gives these untrusted applications database access without harming your core service costs more in time and resources than most Web companies can afford. For now.

The third is that you cannot create your own world -- your own social network -- using the Facebook platform. You cannot build another Facebook with it.

See my response to his first point. The primary reason for the existence of the Facebook platform  is to harness the creativity and resources of outside developers in benefiting the social networks within Facebook. Allowing third party applications to fracture this social network or build competing services doesn't benefit the Facebook application. What Facebook offers developers is access to an audience of engaged users and in exchange these developers make Facebook a more compelling service by building cool applications on it. That way everybody wins.

An application that takes off on Facebook is very quickly adopted by hundreds of thousands, and then millions -- in days! -- and then ultimately tens of millions of users.

Unless you're already operating your own systems at Facebook levels of scale, your servers will promptly explode from all the traffic and you will shortly be sending out an email like this.
...
The implication is, in my view, quite clear -- the Facebook Platform is primarily for use by either big companies, or venture-backed startups with the funding and capability to handle the slightly insane scale requirements. Individual developers are going to have a very hard time taking advantage of it in useful ways.

I think Marc is over blowing the problem here, if one can even call it a problem. A fundamental truth of building Web applications is that if your service is popular then you will eventually hit scale problems. This was happening last century during "Web 1.0" when eBay outages were regular headlines and website owners used to fear the Slashdot effect. Until the nature of the Internet is fundamentally changed, this will always be the case.

However none of this means you can't build a Web application unless you have VC money or are big company. Instead, you should just have a strategy for how to deal with keeping your servers up and running if you service becomes a massive hit with users.  It's a good problem to have but one needs to remember that most Web applications will never have that problem. ;)

When you develop a new Facebook application, you submit it to the directory and someone at Facebook Inc. approves it -- or not.

If your application is not approved for any reason -- or if it's just taking too long -- you apparently have the option of letting your application go out "underground". This means that you need to start your application's proliferation some other way than listing it in the directory -- by promoting it somewhere else on the web, or getting your friends to use it.

But then it can apparently proliferate virally across Facebook just like an approved application.

I think the viral distribution model is probably one of the biggest innovations in the Facebook platform. Announcing to my friends whenever I install a new application so that they can try them out themselves is pretty killer. This feature probably needs to be fine tuned I don't end up recommending or being recommending bad apps like X Me but that is such a minor nitpick. This is potentially a game changing move in the world of software distribution. I mean, can you imagine if you got a notification whenever one of your friends discovered a useful Firefox add-on or a great Sidebar gadget? It definitely beats using TechCrunch or Download.com as your source of cool new apps.


 

Categories: Platforms | Social Software

Doc Searls has a blog post entitled Questions Du Jour where he writes

Dare Obasanjo: Why Facebook is Bigger than Blogging. In response to Kent Newsome's request for an explanation of why Facebook is so cool.

While I think Dare makes some good points, his headline (which differs somewaht from the case he makes in text) reads to me like "why phones are better than books". The logic required here is AND, not OR. Both are good, for their own reasons.

Unlike phones and books, however, neither blogging nor Facebook are the final forms of the basics they offer today.

I think Doc has mischaracterized my post on why social networking services have seen broader adoption than blogging. My post wasn't about which is better since such a statement is as unhelpful as saying a banana is better than an apple. A banana is a better source of potassium but a worse source of dietary fiber. Which is better depends on what metric you are interested in.

My post was about popularity and explaining why, in my opinion, more people create and update their profiles on social networks than write blogs. 


 

Categories: Social Software

Every once in a while someone asks me about software companies to work for in the Seattle area that aren't Microsoft, Amazon or Google. This is the fourth in a series of weekly posts about startups in the Seattle area that I often mention to people when they ask me this question.

Zillow is a real-estate Web site that is slowly revolutionizing how people approach the home buying experience. The service caters to buyers, sellers, potential sellers and real estate professionals in the following ways

  1. For buyers: You can research a house and find out it's vital statistics (e.g. number of bedrooms, square footage, etc) it's current estimated value and how much it sold for when it was last sold. In addition, you can scope out homes that were recently sold in the neighborhood and get a good visual representation of the housing market in a particular area.
  2. For sellers and agents: Homes for sale can be listed on the service
  3. Potential sellers: You can post a Make Me Move™ price without having to actually list your home for sale.

I used Zillow when as part of the home buying process when I got my place and I think the service is fantastic. They also have the right level of buzz given recent high level poachings of Google employees and various profiles in the financial press.

The company was founded by Lloyd Frink and Richard Barton who are ex-Microsoft folks whose previous venture was Expedia, another Web site that revolutionized how people approached a common task.

Press: Fortune on Zillow

Number of Employees: 133

Location: Seattle, WA (Downtown)

Jobs: careers@zillow.hrmdirect.com, current open positions include a number of Software Development Engineer and Software Development Engineer in Test positions as well as a Systems Engineer and Program Manager position.