These are my notes from the talk Using MapReduce on Large Geographic Datasets by Barry Brummit.
Most of this talk was a repetition of the material in the previous talk by Jeff Dean including reusing a lot of the same slides. My notes primarily contain material I felt was unique to this talk.
A common pattern across a lot of Google services is creating a lot of index
files that point and loading them into memory to male lookups fast. This is
also done by the Google Maps team which
has to handle massive amounts of data (e.g. there are over a hundred million
roads in North America).
Below are examples of the kinds of problems the Google Maps has used MapReduce to solve.
When issues are encountered in a MapReduce it is possible for developers to debug these issues by running
applications locally on their desktops.
Developers who would like to harness the power of a several hundred to
several thousand node cluster but do not work at Google can try
The Google infrastructure is the product of Google's engineering culture
has the following ten characteristics
Q: Where are intermediate results from map operations stored?
A: In BigTable or
Q: Can you use MapReduce
incrementally? For example, when new roads are built in North America do we
have to run MapReduce over
teh entire data set or can we only factor in the changed data?
A: Currently, you'll have to process the entire data stream again. However this
is a problem that is the target of lots of active research at Google since it
affects a lot of teams.
The opinions expressed herein are my own personal opinions and do not represent
my employer's view in any way.
© Copyright 2013, Dare Obasanjo - Powered by: newtelligence dasBlog 2.1.8102.813
DirectionalRedux theme by John Forsythe and Jon Stovall