The top story in my favorite RSS reader is the article MapReduce: A major step backwards by David J. DeWitt and Michael Stonebraker. This is one of those articles that is so bad you feel dumber after having read it. The primary thesis of the article is

 MapReduce may be a good idea for writing certain types of general-purpose computations, but to the database community, it is:

  1. A giant step backward in the programming paradigm for large-scale data intensive applications
  2. A sub-optimal implementation, in that it uses brute force instead of indexing
  3. Not novel at all -- it represents a specific implementation of well known techniques developed nearly 25 years ago
  4. Missing most of the features that are routinely included in current DBMS
  5. Incompatible with all of the tools DBMS users have come to depend on

One of the worst things about articles like this is that it gets usually reasonable and intelligent sounding people spouting of bogus responses as knee jerk reactions due to the articles stupidity. The average bogus reaction was the kind by Rich Skrenta in his post Database gods bitch about mapreduce which talks of "disruption" as if Google MapReduce is actually comparable to a relation database management system.

On the other hand, the good thing about articles like this is that you get often get great responses from smart folks that further your understanding of the subject matter even though the original article was crap. For example, take the post from Google employee Mark Chu-Carroll entitled Databases are Hammers; MapReduce is a ScrewDriver where he writes eloquently that

The beauty of MapReduce is that it's easy to write. M/R programs are really as easy as parallel programming ever gets. So, getting back to the article. They criticize MapReduce for, basically, not being based on the idea of a relational database.

That's exactly what's going on here. They've got their relational databases. RDBs are absolutely brilliant things. They're amazing tools, which can be used to build amazing software. I've done a lot of work using RDBs, and without them, I wouldn't have been able to do some of the work that I'm proudest of. I don't want to cut down RDBs at all: they're truly great. But not everything is a relational database, and not everything is naturally suited towards being treated as if it were relational. The criticisms of MapReduce all come down to: "But it's not the way relational databases would do it!" - without every realizing that that's the point. RDBs don't parallelize very well: how many RDBs do you know that can efficiently split a task among 1,000 cheap computers? RDBs don't handle non-tabular data well: RDBs are notorious for doing a poor job on recursive data structures. MapReduce isn't intended to replace relational databases: it's intended to provide a lightweight way of programming things so that they can run fast by running in parallel on a lot of machines. That's all it was intended to do.

Mark’s entire post is a great read.

Greg Jorgensen also has a good rebutal in his post Relational Database Experts Jump The MapReduce Shark which points out that if the original article had been a critique of a Web-based structured data storage systems such as Amazon’s SimpleDB  or Google Base then the comparison may have been almost logical as opposed to being completely ridiculous. Wink

Now playing: Marvin Gaye - I Heard It Through the Grapevine


 

Wednesday, 23 January 2008 15:40:39 (GMT Standard Time, UTC+00:00)
I think maybe they had BigTable in mind instead of MapReduce
as I wrote on my blog (http://www.rgoarchitects.com/nblog/2008/01/19/TheDBMSVsMapReduceIsThatReallyACompetition.aspx)
and it still doesn't makes much sense ;)

Arnon
Monday, 28 January 2008 19:07:56 (GMT Standard Time, UTC+00:00)
There are a few complex mobile, real-time work-flow projects that started to fail due to the recruited developer corps addiction to the RDMS. When the system has to track many concurrent objects with real time token status updates, most RDB systems write saturate.

There are several great embedded object databases that have proven themselves in such projects, but the majority of the talent out there are virgins to such systems.
Comments are closed.