Greg Linden on Google's Personalized Search

September 1, 2006

@ 07:35 PM

Greg Linden has a blog post entitled Google Personalized Search and Bigtable where he writes

One tidbit I found curious in the Google Bigtable paper was this hint about the internals of Google Personalized Search:
Personalized Search generates user profiles using a MapReduce over Bigtable. These user profiles are used to personalize live search results.
This appears to confirm that Google Personalized Search works by building high-level profiles of user interests from their past behavior.

I would guess it works by determining subject interests (e.g. sports, computers) and biasing all search results toward those categories. That would be similar to the old personalized search in Google Labs (which was based on Kaltix technology) where you had to explicitly specify that profile, but now the profile is generated implicitly using your search history.

My concern with this approach is that it does not focus on what you are doing right now, what you are trying to find, your current mission. Instead, it is a coarse-grained bias of all results toward what you generally seem to enjoy.

This problem is worse if the profiles are not updated in real time.

I totally disagree with Greg here on almost every point. Building a profile of a user's interests to improve their search results is totally different from improving their search results in realtime. The former is personalized search while the latter is more akin to clustering of search results. For example, if I search for "football", a search engine can either use the fact that I've searched for soccer related terms in the past to bubble up the offical website of Fédération Internationale de Football Association (FIFA) instead of the National Football League (NFL) website in the search results or it could cluster the results of the search so I see all the options. Ideally, it should do both. However, expecting that my profile is built in realtime (e.g. learning from my search results from five minutes ago as opposed to those from five days ago) although ideal doesn't seem to me to be necessary to be beneficial to end users. This seems like one of those places where a good enough offline-processing based solution is better than a ~~over~~ better engineered real-time solution. Search is rarely about returning or reacting to realtime data anyway. :)

PS: I do think it's quite interesting to see how many Google applications are built on BigTable and MapReduce. From the post Namespaced Extensions in Feeds it looks like Google Reader is another example.

Categories: Competitors/Web Companies

« (thread.Priority = ThreadPriority.BelowN... | Home | When a Plan Comes Together »

Dare Obasanjo's weblog

"You can buy cars but you can't buy respect in the hood" - Curtis Jackson

Navigation for Greg Linden on Google's Personalized Search - Dare Obasanjo's weblog