I was recently in a conversation where we were talking about things we'd learned in college that helped us in our jobs today. I tried to distill it down to one thing but couldn't so here are the three I came up with.

  1. Operating systems aren't elegant. They are a glorious heap of performance hacks piled one upon the other.
  2. Software engineering is the art of amassing collected anecdotes and calling them Best Practices when in truth they have more in common with fads than anything else.
  3. Pizza is better than Chinese food for late night coding sessions.

What are yours?


 

Categories: Technology

Disclaimer: This is my opinion. It does not reflect the intentions, strategies, plans or stated goals of my employer

Ever since the last Microsoft reorg where it's Web products were spread out across 3 Vice Presidents I've puzzled about why the company would want to fragment its product direction in such a competitive space instead of having a single person responsible for its online strategy.

Today, I was reading an interview with Chris Jones, the corporate vice president of Windows Live Experience Program Management entitled Windows Live Moves Into Next Phase with Renewed Focus on Software + Services and a lightbulb went off in my head. The relevant bits are excerpted below

PressPass: What else is Microsoft announcing today?

Jones: Today we’re also releasing a couple of exciting new services from Windows Live into managed beta testing: Windows Live Photo Gallery beta and Windows Live Folders beta.

Windows Live Photo Gallery is an upgrade to Windows Vista’s Windows Photo Gallery, offered at no charge, and enables both Windows Vista and Windows XP SP2 customers to share, edit, organize and print photos and digital home videos... We’re also releasing Windows Live Folders into managed beta today, which will provide customers with 500 megabytes of online storage at no charge.
...
We’re excited about these services and we see today’s releases as yet another important step on the path toward the next generation of Windows Live, building on top of the momentum of other interesting beta releases we’ve shared recently such as Windows Live Mail beta, Windows Live Messenger beta and Windows Live Writer beta....soon we’ll begin to offer a single installer which will give customers the option of an all-in-one download for the full Windows Live suite of services instead of the separate installation experience you see today. It’s going to be an exciting area to watch, and there’s a lot more to come.

PressPass: You talk a lot about a “software plus services” strategy. What does that mean and how does it apply to what you’re talking about today?

Jones: It’s become a buzz word of sorts in the industry, but it’s a strategy we truly believe in. The fact that we’re committed to delivering software plus services means we’re focused on building rich experiences on top of your Windows PC; services like those offered through Windows Live.

All the items in red font refer to Windows desktop applications in one way or the other. At this point it now made sense to me why there were three VPs running different bits of Microsoft's online products and why one of them was also the VP that owned Windows. The last reorg seems to have divided Microsoft's major tasks in the online space across the various VPs in the following manner

  • Satya Nadella: Running the search + search ads business (i.e. primarily competing with Google search and AdWords)

  • Steve Berkowitz: Running the content + display ads business (i.e. primarily competing with Yahoo!'s content and display ad offerings)

  • Steven Sinofsky and Chris Jones: Adding value to the Windows platform using online services (i.e. building something similar to iLife + .Mac for Windows users). 

From that perspective, the reorgs make a lot more sense now. The goals and businesses are different enough that having people singularly focused on each of those tasks makes more sense than having one person worry about such disparate [and perhaps conflicting] goals. The interesting question to me is what does it mean for Microsoft's Web-based Windows Live properties like Windows Live Hotmail, Windows Live Favorites and Windows Live Spaces if Microsoft is going to be emphasizing the Windows in Windows Live? I guess we've already seen announcements some announcements from the mail side like Windows Live Mail and the Microsoft Office Outlook Connector now being free.

Another interesting question is where  Ray Ozzie fits in all this.


 

Categories: Life in the B0rg Cube | MSN | Windows Live

These are my notes from the talk Scaling Google for Every User by Marissa Mayer.

Google search has lots of different users who vary in age, sex, location, education, expertise and a lot of other factors. After lots of research, it seems the only factor that really influences how different users view search relevance is their location.

One thing that does distinguish users is the difference between a novice search user and an expert user of search. Novice users typically type queries in natural language while expert users use keyword searches.

Example Novice and Expert Search User Queries

NOVICE QUERY: Why doesn't anyone carry an umbrella in Seattle?
EXPERT QUERY: weather seattle washington

NOVICE QUERY: can I hike in the seattle area?
EXPERT QUERY: hike seattle area

On average, it takes a new Google user 1 month to go from typing novice queries to being a search expert. This means that there is little payoff in optimizing the site to help novices since they become search experts in such a short time frame.

Design Philosophy

In general, when it comes to the PC user experience, the more features available the better the user experience. However when it comes to handheld devices the graph is a bell curve and there reaches a point where adding extra features makes the user experience worse. At Google, they believe their experience is more like the latter and tend to hide features on the main page and only show them when necessary (e.g. after the user has performed a search). This is in contrast to the portal strategy from the 1990s when sites would list their entire product line on the front page.

When tasked with taking over the user interface for Google search, Marissa Mayer fell back on her AI background and focused on applying mathematical reasoning to the problem. Like Amazon, they decided to use split A/B testing to test different changes they planned to make to the user interface to see which got the best reaction from their users. One example of the kind of experiments they've run is when the founders asked whether they should switch from displaying 10 search results by default because Yahoo! was displaying 20 results. They'd only picked 10 results arbitrarily because that's what Alta Vista did. They had some focus groups and the majority of users said they'd like to see more than 10 results per page. So they ran an experiment with 20, 25 and 30 results and were surprised at the outcome. After 6 weeks, 25% of the people who were getting 30 results used Google search less while 20% of the people getting 20 results used the site less. The initial suspicion was that people weren't having to click the "next" button as much because they were getting more results but further investigation showed that people rarely click that link anyway. Then the Google researchers realized that while it took 0.4 seconds on average to render 10 results it took 0.9 seconds on average to render 25 results. This seemingly imperciptible lag was still enough to sour the experience of users enough that they'd reduce their usage of the service.

Improving Google Search

There are a number of factors that determine whether a user will find a set of search results to be relevant which include the query, the actual user's individual tastes, the task at hand and the user's locale. Locale is especially important because a query such as "GM" is likely be a search for General Motors but a query such as "GM foods" is most likely seeking information about genetically modified foods. Given a large enough corpus of data, statistical inference can seem almost like artificial intelligence. Another example is that a search like b&b ab looks for bed and breakfasts in Alberta while ramstein ab locates the Ramstein Airforce Base. This is because in general b&b typically means bed and breakfast so a search like "b&b ab" it is assumed that the term after "b&b" is a place name based on statistical inference over millions of such queries.

At Google they want to get even better at knowing what you mean instead of just looking at what you say. Here are some examples of user queries which Google will transform to other queries based on statistical inference [in future versions of the search engine]

User Query Google Will Also Try This Query
unchanged lyrics van halenlyrics to unchained by van halen
how much does it cost for an exhaust systemcost exhaust system
overhead view of bellagio poolbellagio pool pictures
distance from zurich switzerland to lake como italytrain milan italy zurich switzerland

Performing query inference in this manner is a very large scale, ill-defined problem. Other efforts Google is pursuing is cross language information retrieval. Specifically, if I perform a query in one language it will be translated to a foreign language and the results would then be translated to my language. This may not be particularly interesting for English speakers since most of the Web is in English but it will be valuable for other languages (e.g. an Arabic speaker interested in restaurant reviews from New York City restaurants).

Google Universal Search was a revamp of the core engine to show results other than text-based URLs and website summaries in the search results (e.g. search for nosferatu). There were a number of challenges in building this functionality such as

  • Google's search verticals such as books, blog, news, video, and image search got a lot less traffic than the main search engine and originally couldn't handle receiving the same level of traffic as the main page.
  • How do you rank results across different media to figure out the most relevant? How do you decide a video result is more relevant than an image or a webpage? This problem was tackled by Udi Manber's team.
  • How do you integrate results from other media into the existing search result page? Should results be segregated by type or should it be a list ordered by relevance independent of media type? The current design was finally decided upon by Marissa Mayer's team but they will continue to incrementally improve it and measure the user reactions.

At Google, the belief is that the next big revolution is a search engine that understands what you want because it knows you. This means personalization is the next big frontier. A couple of years ago, the tech media was full of reports that a bunch of Stanford students had figured out how to make Google five times faster. This was actually incorrect. The students had figured out how to make PageRank calculations faster which doesn't really affect the speed of obtaining search results since PageRank is calculated offline. However this was still interesting to Google and the students' company was purchased. It turns out that making PageRank faster means that they can now calculate multiple PageRanks in the time it used to take to calculate a single PageRank (e.g. country specific PageRank, personal PageRank for a given user, etc). The aforementioned Stanford students now work on Google's personalized search efforts.

Speaking of personalization, iGoogle has become their fastest growing product of all time. Allowing users create a personalized page then opening up the platform to developers such Caleb to build gadgets lets them learn more about their users. Caleb's collection of gadgets garner about 30 million daily page views on various personalized homepage.

Q&A

Q: Does the focus on expert searchers mean that they de-emphasis natural language processing?
A: Yes, in the main search engine. However they do focus on it for their voice search product and they do believe that it is unfortunate that users have to adapt to Google's keyword based search style.

Q: How do the observations that are data mined about users search habits get back into the core engine?
A: Most of it happens offline not automatically. Personalized search is an exception and this data is uploaded periodically into the main engine to improve the results specific to that user.

Q: How well is the new Universal Search interface doing?
A: As well as Google Search is since it is now the Google search interface.

Q: What is the primary metric they look at during A/B testing?
A: It depends on what aspect of the service is being tested.

Q: Has there been user resistance to new features?
A: Not really. Google employees are actually more resistant to changes in the search interface than their average user.

Q: Why did they switch to showing Google Finance before Yahoo! Finance when showing search results for a stock ticker?
A: Links used to be ordered by ComScore metrics but ince Google Finance shipped they decided to show their service first. This is now a standard policy for Google search results that contain links to other services.

Q: How do they tell if they have bad results?
A: They have a bunch of watchdog services that track uptime for various servers to make sure a bad one isn't causing problems. In addition, they have 10,000 human evaluators who are always manually checking teh relevance of various results.

Q: How do they deal with spam?
A: Lots of definitions for spam; bad queries, bad results and email spam. For keeping out bad results they do automated link analysis (e.g. examine excessive number of links to a URL from a single domain or set of domains) and they use multiple user agents to detect cloaking.

Q: What percent of the Web is crawled?
A: They try to crawl most of it except that which is behind signins and product databases. And for product databases they now have Google Base and encourage people to upload their data there so it is accessible to Google.

Q: When will I be able to search using input other than search (e.g. find this tune or find the face in this photograph)?
A: We are still a long way from this. In academia, we now have experiments that show 50%-60% accuracy but that's a far cry from being a viable end user product. Customers don't want a search engine that gives relevant results half the time.


 

Categories: Trip Report

These are my notes from the talk Lessons in Building Scalable Systems by Reza Behforooz.

The Google Talk team have produced multiple versions of their application. There is

  • a desktop IM client which speaks the Jabber/XMPP protocol.
  • a Web-based IM client that is integrated into GMail
  • a Web-based IM client that is integrated into Orkut
  • An IM widget which can be embedded in iGoogle or in any website that supports embedding Flash.

Google Talk Server Challenges

The team has had to deal with a significant set of challenges since the service launched including

  • Support displaying online presence and sending messages for millions of users. Peak traffic is in hundreds of thousands of queries per second with a daily average of billions of messages handled by the system.

  • routing and application logic has to be applied to each message according to the preferences of each user while keeping latency under 100ms.

  • handling surge of traffic from integration with Orkut and GMail.

  • ensuring in-order delivery of messages

  • needing an extensibile architecture which could support a variety of clients

Lessons

The most important lesson the Google Talk team learned is that you have to measure the right things. Questions like "how many active users do you have" and "how many IM messages does the system carry a day" may be good for evaluating marketshare but are not good questions from an engineering perspective if one is trying to get insight into how the system is performing.

Specifically, the biggest strain on the system actually turns out to be displaying presence information. The formula for determining how many presence notifications they send out is

total_number_of_connected_users * avg_buddy_list_size * avg_number_of_state_changes

Sometimes there are drastic jumps in these numbers. For example, integrating with Orkut increased the average buddy list size since people usually have more friends in a social networking service than they have IM buddies.

Other lessons learned were

  1. Slowly Ramp Up High Traffic Partners: To see what real world usage patterns would look like when Google Talk was integrated with Orkut and GMail, both services added code to fetch online presence from the Google Talk servers to their pages that displayed a user's contacts without adding any UI integration. This way the feature could be tested under real load without users being aware that there were any problems if there were capacity problems. In addition, the feature was rolled out to small groups of users at first (around 1%).

  2. Dynamic Repartitioning: In general, it is a good idea to divide user data across various servers (aka partitioning or sharding) to reduce bottlenecks and spread out the load. However, the infrastructure should support redistributing these partitions/shards without having to cause any downtime.

  3. Add Abstractions that Hide System Complexity: Partner services such as Orkut and GMail don't know which data centers contain the Google Talk servers, how many servers are in the Google Talk cluster and are oblivious of when or how load balancing, repartitioning or failover occurs in the Google Talk service.

  4. Understand Semantics of Low Level Libraries: Sometimes low level details can stick it to you. The Google Talk developers found out that using epoll worked better than the poll/select loop because they have lots of open TCP conections but only a relatively small number of them are active at any time.

  5. Protect Against Operational Problems: Review logs and endeavor to smooth out spikes in activity graphs. Limit cascading problems by having logic to back off from using busy or sick servers.

  6. Any Scalable System is a Distributed System: Apply the lessons from the fallacies of distributed computing. Add fault tolerance to all your components. Add profiling to live services and follow transactions as they flow through the system (preferably in a non-intrusive manner). Collect metrics from services for monitoring both for real time diagnosis and offline generation of reports.

Recommended Software Development Strategies

Compatibility is very important, so making sure deployed binaries are backwards and forward compatible is always a good idea. Giving developers access to live servers (ideally public beta servers not main production servers) will encourage them to test and try out ideas quickly. It also gives them a sense of empowerement. Developers end up making their systems easier to deploy, configure, monitor, debug and maintain when they have a better idea of the end to end process.

Building an experimentation platform which allows you to empirically test the results of various changes to the service is also recommended.


 

Categories: Platforms | Trip Report

These are my notes from the talk Using MapReduce on Large Geographic Datasets by Barry Brummit.

Most of this talk was a repetition of the material in the previous talk by Jeff Dean including reusing a lot of the same slides. My notes primarily contain material I felt was unique to this talk.

A common pattern across a lot of Google services is creating a lot of index files that point and loading them into memory to male lookups fast. This is also done by the Google Maps team which has to handle massive amounts of data (e.g. there are over a hundred million roads in North America).

Below are examples of the kinds of problems the Google Maps has used MapReduce to solve.

Locating all points that connect to a particular road
Input Map Shuffle Reduce Output
List of roads and intersections Create pairs of connected points such as {road, intersection} or {road, road} pairs Sort by key Get list of pairs with the same key A list of all the points that connect to a particular road

Rendering Map Tiles
Input Map Shuffle Reduce Output
Geographic Feature List Emit each feature on a set of overlapping lat/long rectangles Sort by Key Emit tile using data for all enclosed features Rendered tiles

Finding Nearest Gas Station to an Address within five miles
Input Map Shuffle Reduce Output
Graph describing node network with all gas stations marked Search five mile radius of each gas station and mark distance to each node Sort by key For each node, emit path and gas station with the shortest distance Graph marked with nearest gas station to each node

When issues are encountered in a MapReduce it is possible for developers to debug these issues by running their MapReduce applications locally on their desktops.

Developers who would like to harness the power of a several hundred to several thousand node cluster but do not work at Google can try

Recruiting Sales Pitch

[The conference was part recruiting event so some of the speakers ended their talks with a recruiting spiel. - Dare]

The Google infrastructure is the product of Google's engineering culture has the following ten characteristics

  1. single source code repository for all Google code
  2. Developers can checkin fixes for any Google product
  3. You can build any Google product in three steps (get, configure, make)
  4. Uniform coding standards across the company
  5. Mandatory code reviews before checkins
  6. Pervasive unit testing
  7. Tests run nightly, emails sent to developers if any failures
  8. Powerful tools that are shared company-wide
  9. Rapid project cycles, developers change projects often, 20% time
  10. Peer driven review process, flat management hierarchy

Q&A

Q: Where are intermediate results from map operations stored?
A: In BigTable or GFS

Q: Can you use MapReduce incrementally? For example, when new roads are built in North America do we have to run MapReduce over teh entire data set or can we only factor in the changed data?
A: Currently, you'll have to process the entire data stream again. However this is a problem that is the target of lots of active research at Google since it affects a lot of teams.


 

Categories: Platforms | Trip Report

These are my notes from the keynote session MapReduce, BigTable, and Other Distributed System Abstractions for Handling Large Datasets by Jeff Dean.

The talk was about the three pillars of Google's data storage and processing platform; GFS, BigTable and MapReduce.

GFS

The developers at Google decided to build their own custom distributed file system because they felt that they had unique requirements. These requirements included

  • scalable to thousands of network nodes
  • massive read/write bandwidth requirements
  • ability to handle large blocks of data which are gigabytes in size.
  • need exremely efficient distribution of operations across nodes to reduce bottlenecks

One benefit the developers of GFS had was that since it was an in-house application they could control the environment, the client applications and the libraries a lot better than in the off-the-shelf case.

GFS Server Architecture

There are two server types in the GFS system.

Master servers
These keep the metadata on the various data files (in 64MB chunks) within the file system. Client applications talk to the master servers to perform metadata operations on files or to locate the actual chunk server that contains the actual bits on disk.
Chunk servers
These contain the actual bits on disk and can be considered to be dumb file servers. Each chunk is replicated across three different chunk servers to create redundancy in case of server crashes. Client applications retrieve data files directly from chunk servers once they've been directed to the chunk server which contains the chunk they want by a master server.

There are currently over 200 GFS clusters at Google, some of which have over 5000 machines. They now have pools of tens of thousands of machines retrieving data from GFS clusters that run as large as 5 petabytes of storage with read/write throughput of over 40 gigabytes/second across the cluster.

MapReduce

At Google they do a lot of processing of very large amounts of data. In the old days, developers would have to write their own code to partition the large data sets, checkpoint code and save intermediate results, handle failover in case of server crashes, and so on as well as actually writing the business logic for the actual data processing they wanted to do which could have been something straightforward like counting the occurence of words in various Web pages or grouping documents by content checksums. The decision was made to reduce the duplication of effort and complexity of performing data processing tasks by building a platform technology that everyone at Google could use which handled all the generic tasks of working on very large data sets. So MapReduce was born.

MapReduce is an application programming interface for processing very large data sets. Application developers feed in a key/value pair (e.g. {URL,HTML content} pair) then use the map function to extract relevant information from each record which should produce a set of intermediate key/value pairs (e.g. {word, 1 } pairs for each time a word is encountered) and finally the reduce function merges the intermediate values associated with the same key to produce the final output (e.g. {word, total count of occurences} pairs).

A developer only has to write their specific map and reduce operations for their data sets which could run as low as 25 - 50 lines of code while the MapReduce infrastructure deals with parallelizing the task and distributing it across different machines, handling machine failures and error conditions in the data, optimizations such as moving computation close to the data to reduce I/O bandwidth consumed, providing system monitoring and making the service scalable across hundreds to thousands of machines.

Currently, almost every major product at Google uses MapReduce in some way. There are 6000 MapReduce applications checked into the Google source tree with the hundreds of new applications that utilize it being written per month. To illustrate its ease of use, a graph of new MapReduce applications checked into the Google source tree over time shows that there is a spike every summer as interns show up and create a flood of new MapReduce applications that are then checked into the Google source tree.

MapReduce Server Architecture

There are three server types in the MapReduce system.

Master server
This assigns user tasks to map and reduce servers as well as keeps track of the state of these tasks.
Map Servers
Accepts user input and performs map operation on them then writes the results to intermediate files
Reduce Servers
Accepts intermediate files produced by map servers and performs reduce operation on them.

One of the main issues they have to deal with in the MapReduce system is problem of stragglers. Stragglers are servers that run slower than expected for one reason or the other. Sometimes stragglers may be due to hardware issues (e.g. bad harddrive conttroller causes reduced I/O throughput) or may just be from the server running too many complex jobs which utilize too much CPU. To counter the effects of stragglers, they now assign multiple servers the same jobs which counterintuitively ends making tasks finish quicker. Another clever optimization is that all data transferred between map and reduce servers is compressed since the servers usually aren't CPU bound so compression/decompression costs are a small price to pay for bandwidth and I/O savings.

BigTable

After the creation of GFS, the need for structured and semi-structured storage that went beyond opaque files became clear. Examples of situations that could benefit from this included

  • associating metadata with a URL such as when it was crawled, its PageRank™, contents, links to it, etc
  • associating data with a user such as the user's search history and preferences
  • geographical data such as information about roads and sattelite imagery

The system required would need to be able to scale to storing billions of URLs, hundreds of terabytes of satellite imagery, data associated preferences with hundreds of millions of users and more. It was immediately obvious that this wasn't a task for an off-the-shelf commercial database system due to the scale requirements and the fact that such a system would be prohibitively expensive even if it did exist. In addition, an off-the-shelf system would not be able to make optimizations based on the underlying GFS file system. Thus BigTable was born.

BigTable is not a relational database. It does not support joins nor does it support rich SQL-like queries. Instead it is more like a multi-level map data structure. It is a large scale, fault tolerant, self managing system with terabytes of memory and petabytes of storage space which can handle millions of reads/writes per second. BigTable is now used by over sixty Google products and projects as the platform for storing and retrieving structured data.

The BigTable data model is fairly straightforward, each data item is stored in a cell which can be accessed using its {row key, column key, timestamp}. The need for a timestamp came about because it was discovered that many Google services store and compare the same data over time (e.g. HTML content for a URL). The data for each row is stored in one or more tablets which are actually a sequence of 64KB blocks in a data format called SSTable.

BigTable Server Architecture

There are three primary server types of interest in the BigTable system.

Master servers
Assigns tablets to tablet servers, keeps track of where tablets are located and redistributes tasks as needed.
Tablet servers
Handle read/write requests for tablets and split tablets when they exceed size limits (usually 100MB - 200MB). If a tablet server fails, then a 100 tablet servers each pickup 1 new tablet and the system recovers.
Lock servers
These are instances of the Chubby distributed lock service. Lots of actions within BigTable require acquisition of locks including opening tablets for writing, ensuring that there is no more than one active Master at a time, and access control checking.

There are a number of optimizations which applications can take advantage of in BigTable. One example is the concept of locality groups. For example, some of the simple metadata associated with a particular URL which is typically accessed together (e.g. language, PageRank™ , etc) can be physically stored together by placing them in a locality group while other columns (e.g. content) are in a separate locality group. In addition, tablets are usually kept in memory until the machine is running out of memory before their data is written to GFS as an SSTable and a new in memory table is created. This process is called compaction. There are other types of compactions where in memory tables are merged with SSTables on disk to create an entirely new SSTable which is then stored in GFS.

Current Challenges Facing Google's Infrastructure

Although Google's infrastructure works well at the single cluster level, there are a number of areas with room for improvement including

  • support for geo-distributed clusters
  • single global namespace for all data since currently data is segregated by cluster
  • more and better automated migration of data and computation
  • lots of consistency issues when you couple wide area replication with network partitioning (e.g. keeping services up even if a cluster goes offline for maintenance or due to some sort of outage).

Recruiting Sales Pitch

[The conference was part recruiting event so some of the speakers ended their talks with a recruiting spiel - Dare]

Having access to lots of data and computing power is a geek playground. You can build cool, seemingly trivial apps on top of the data such which turn out to be really useful such as Google Trends and catching misspellings of "britney spears. Another example of the kinds of apps you can build when you have enough data treating the problem of language translation as a statistical modeling problem which turns out to be one of the most successful methods around.

Google hires smart people and lets them work in small teams of 3 to 5 people. They can get away with teams being that small because they have the benefit of an infrastructure that takes care of all the hard problems so devs can focus on building interesting, innovative apps.


 

Categories: Platforms | Trip Report

June 23, 2007
@ 03:32 PM

THEN: The PayPerPost Virus Spreads

Two new services that are similar to the controversial PayPerPost have announced their launch in the last few days: ReviewMe and CreamAid. PayPerPost, a marketplace for advertisers to pay bloggers to write about products (with our without disclosure), recently gained additional attention when they announced a $3 million round of venture financing.

The PayPerPost model brings up memories of payola in the music industry, something the FCC and state attorney generals are still trying to eliminate or control. Given the distributed and unlicensed nature of the blogosphere, controlling payoffs to bloggers will be exponentially more difficult.

Our position on these pay-to-shill services is clear: they are a natural result of the growth in size and influence of the blogosphere, but they undermine the credibility of the entire ecosystem and mislead readers.

NOW: I’m shocked, shocked to find that gambling is going on in here!

The title, which is a quote from the movie casablanca, is what came to mind tonight when I read the complete train wreck occuring on TechMeme over advertisements that contain a written message from the publisher. The whole thing was started by Valleywag of course.

The ads in question are a staple of FM Publishing - a standard ad unit contains a quote by the publisher saying something about something. It isn’t a direct endorsement. Rather, it’s usually an answer to some lame slogan created by the adveriser. It makes the ad more personal and has a higher click through rate, or so we’ve been told. In the case of the Microsoft ad, we were quoted how we had become “people ready,” whatever that means. See our answer and some of the others here (I think it will be hard to find this text controversial, or anything other then extremely boring). We do these all the time…generally FM suggests some language and we approve or tweak it to make it less lame. The ads go up, we get paid. This has been going on for months and months - at least since the summer of 2006. It’s nothing new. It’s text in an ad box. I think people are pretty aware of what that means…which is nothing.

Any questions?


 

Categories: Current Affairs

I was reading reddit this morning and spotted a reference to the Microsoft Popfly team's group picture which pointed out that from reading the job titles in the pic there were 9 managers and 5 developers on the product team. The list of people in the picture and their titles from the picture are excerpted below

From left to right: John Montgomery (Group Program Manager), Andy Sterland (Program Manager), Alpesh Gaglani (Developer), Tim Rice (Developer), Suzanne Hansen (Program Manager), Steven Wilssens (Program Manager), Vinay Deo (Engineering Manager), Michael Leonard (Test Developer), Jianchun Xu (Developer), Dan Fernandez (Product Manager), Adam Nathan (Developer), Wes Hutchins (Program Manager), Aaron Brethorst (Program Manager), Paramesh Vaidyanathan (Product Unit Manager), and Murali Potluri (Developer).

A Microsoft employee followed up the reddit link with a comment pointing out that it is actually 5 dev, 5 PM, 1 test, 3 managers and 1 marketing. This sounds a lot better but I still find it interesting that there is a 1:1 ratio of Program Managers (i.e. design features/APIs, write specs, call meetings) to Developer (i.e. write code, fix bugs, ignore PMs). Although this ratio isn't unusual for Microsoft this has always struck me as rather high. I've always felt that a decent ratio of PMs to developers is more like 1:2 or higher. And I've seen some claim ratios like 1 PM to 5 developers for Agile projects but haven't been able to find much about industry averages online. It seems must discussion about staffing ratios on software projects focus on Developer to Test ratios and even then the conclusion on is it depends. I think the PM to Developer ratio question is more clear cut.

What are good ratios that have worked for you in the past and what would you consider to be a bad ratio?

PS: A note underneath the group picture mentions that some folks on the team aren't pictured but I looked them up and they are in marketing so they aren't relevant to this discussion.


 

Categories: Programming

  1. XKCD: Pickup Lines: "If I could rearrange the alphabet..."

  2. Chart: Chances of a Man Winning an Argument plotted over Time: I'm in the middle period. :)

  3. Fake Steve Jobs: Microsoft Goes Pussy: "We've integrated search into our OS too. It makes sense. And Microsoft's search stuff in Vista is really good (God I just threw up in my mouth when I wrote that)..."

  4. Chris Kelly: Das Capital One: "Back before Capital One, there were just two kinds of consumers: People who could afford credit cards and people who couldn't afford credit cards...The guy who started Capital One imagined a third kind of person - someone who could almost afford a credit card. A virtual credit card holder. Something between a good risk and a social parasite."

  5. I CAN HAS CHEEZBURGER?: OH HAI GOOGLZ: Google Street View + lolcats = Comedic Gold

  6. Bileblog: Google Code - Ugliness is not just skin deep: "The administrative menu is, to put it as kindly as possible, whimsical.Menu items and options are scattered about like goat pebbleturds on amountain. The only option under ‘Advanced’ is ‘Delete this project’.How is that advanced functionality?"

  7. Wikipedia: Pokémon test: "Each of the 493 Pokémon has its own page, all of which are bigger than stubs. While it would be expected that Pikachu would have its own page, some might be surprised to find out that Stantler has its own page, as well. Some people perceive Pokémon as something 'for little kids' and argue that if that gets an article, so should their favorite hobby/band/made-up word/whatever."

  8. YouTube: A Cialis Ad With Cuba Gooding Jr.: From the folks at NationalBanana, lots of funny content on their site.

  9. Bumper Sticker: Hell Was Full: Saw this on my way to work.

  10. YouTube: Microsoft Surface Parody - "The future is here and it's not an iPhone. It's a big @$$ table. Take that Apple"


 

According to my Feedburner stats it seems I lost about 214 subscribers using Google Reader between Saturday June 16th and Sunday June 17th. This seems like a fairly significant number of readers to unsubscribe from my blog on a weekend especially since I don't think I posted anything particularly controversial relative to my regular posts.

I was wondering if any other Feedburner users noticed a similar dip in their subscribers numbers via Google Reader over the weekend or whether it's just a coincidence that I happened to lose so many regular readers at once?


 

Categories: Personal