Building Scalable Databases: Are Relational Databases Compatible with Large Scale Websites?

March 10, 2010

@ 03:06 PM

A few weeks ago Todd Hoff over on the High Scalability blog penned a blog post titled MySQL and Memcached: End of an Era? where he wrote

If you look at the early days of this blog, when web scalability was still in its heady bloom of youth, many of the articles had to do with leveraging MySQL and memcached. Exciting times. Shard MySQL to handle high write loads, cache objects in memcached to handle high read loads, and then write a lot of glue code to make it all work together. That was state of the art, that was how it was done. The architecture of many major sites still follow this pattern today, largely because with enough elbow grease, it works.
…
With a little perspective, it's clear the MySQL+memcached era is passing.
…
LinkedIn has moved on with their Project Voldemort. Amazon went there a while ago.

Digg declared their entrance into a new era in a post on their blog titled Looking to the future with Cassandra,
…
Twitter has also declared their move in the article Cassandra @ Twitter: An Interview with Ryan King.

Todd’s blog has been a useful source of information on the topic of scaling large scale websites since he catalogues as many presentations as he can find from industry leaders on how they’ve designed their systems to deal with millions to hundreds of millions of users pounding their services a day. What he’s written above is really an observation about industry trends and isn’t really meant to attack any technology. I did find it interesting that many took it as an attack on memcached and/or relational databases and came out swinging.

One post which I thought tried to take a balanced approach to rebuttal was Dennis Forbes’ Getting Real about NoSQL and the SQL-Isn't-Scalable Lie where he writes

I work in the financial industry. RDBMS’ and the Structured Query Language (SQL) can be found at the nucleus of most of our solutions. The same was true when I worked in the insurance, telecommunication, and power generation industries. So it piqued my interest when a peer recently forwarded an article titled “The end of SQL and relational databases”, adding the subject line “We’re living in the past”. [Though as Michael Stonebraker points out, SQL the query language actually has remarkably little to actually to do with the debate. It would be more clearly called NoACID]
…
From a vertical scaling perspective — it’s the easiest and often the most computationally effective way to scale (albeit being very inefficient from a cost perspective) — you have the capacity to deploy your solution on powerful systems with armies of powerful cores, hundreds of GBs of memory, operating against SAN arrays with ranks and ranks of SSDs.

The computational and I/O capacity possible on a single “machine” are positively enormous. The storage system, which is the biggest limiting factor on most database platforms, is ridiculously scalable, especially in the bold new world of SSDs (or flash cards like the FusionIO).
…
From a horizontal scaling perspective you can partition the data across many machines, ideally configuring each machine in a failover cluster so you have complete redundancy and availability. With Oracle RAC and Sybase ASE you can even add the classic clustering approach. Such a solution — even on a stodgy old RDBMS — is scalable far beyond any real world need because you’ve built a system for a large corporation, deployed in your own datacenter, with few constraints beyond the limits of technology and the platform.

Your solution will cost hundreds of thousands of dollars (if not millions) to deploy, but that isn’t a critical blocking point for most enterprises.This sort of scaling that is at the heart of virtually every bank, trading system, energy platform, retailing system, and so on.

To claim that SQL systems don’t scale, in defiance of such obvious and overwhelming evidence, defies all reason.

There’s lots of good for food for thought in both blog posts. Todd is right that a few large scale websites are moving beyond the horizontal scaling approach that Dennis brought up in his rebuttal based on their experiences. What tends to happen once you’ve built a partitioned/sharded SQL database architecture is that you tend to notice that you’ve given up most of the features of an ACID relational database. You give up the advantages of the relationships by eschewing foreign keys, triggers and joins since these are prohibitively expensive to run across multiple databases. Denormalizing the data means that you give up on Atomicity, Consistency and Isolation when updating or retrieving results. And the end all you have left is that your data is Durable (i.e. it is persistently stored) which isn’t much better than you get from a dumb file system. Well, actually you also get to use SQL as your programming model which is nicer than performing direct file I/O operations.

It is unsurprising that after being at this point for years, some people in our industry have wondered whether it doesn’t make more sense to use data stores that are optimized for the usage patterns of large scale websites instead of gloriously misusing relational databases. A good example of the tradeoffs is the blog post from the Digg team on why they switched to Cassandra. The database was already sharded which made performing joins to calculate results of queries such as “which of my friends Dugg this item?” to be infeasible. So instead they had to perform two reads from SQL (all Diggs on an item and all of the user’s friends) then perform the intersection operation on the PHP front end code. If the item was not already cached, this leads to disk I/O which could take seconds. To make the situation worse, you actually want to perform this operation multiple times on a single page view since it is reasonable to expect multiple Digg buttons on a page if it has multiple stories on it.

An alternate approach is to denormalize the data and for each user store a list of stories that have been Dugg by at least one of their friends. So whenever I Digg an item, an entry is placed in each of my friends’ lists to indicate that story is now one that has been Dugg by a friend. That way when the a friend of mine shows up, it is a simple lookup to say “is this story ID on the list of stories Dugg by one of their friends?” The challenge here is that it means Digging an item can result in literally thousands of logical write operations. It has been traditionally prohibitively expensive to incur such massive amounts of write I/O in relational databases with all of their transactionality and enforcing of ACID constraints. NoSQL databases like Cassandra which assume your data is denormalized are actually optimized for write I/O heavy operations given the necessity of having to perform enormous amounts of writes to keep data consistent.

Digg’s usage of Cassandra actually serves as a rebuttal to Dennis Forbes’ article since they couldn’t feasibly get what they want with either horizontal or vertical scaling of their relational database-based solution. I would argue that introducing memcached into the mix would have addressed disk I/O concerns because all records of who has Dugg an item could be stored in-memory so comparisons of which of my friends have Dugg an item never have to go to disk to answer any parts of the query. The only caveat with that approach is that RAM is more expensive than disk so you’ll need a lot more servers to store 3 terabytes of data in memory than you would on disk.

However, the programming model is not the only factor one most consider when deciding whether to stay with a sharded/partitioned relational database versus going with a NoSQL solution. The other factor to consider is the actual management of the database servers. The sorts of questions one has to ask when choosing a database solution are listed in the interview with Ryan King of Twitter where he lists the following checklist that they evaluated before deciding to go with Cassandra over MySQL

We first evaluated them on their architectures by asking many questions along the lines of:

How will we add new machines?

Are their any single points of failure?

Do the writes scale as well?

How much administration will the system require?

If its open source, is there a healthy community?

How much time and effort would we have to expend to deploy and integrate it?

Does it use technology which we know we can work with?

The problem with database sharding is that it isn’t really a supported out of the box configuration for your traditional relational database product especially the open source ones. How your system deals with new machines being added to the cluster or handles machine failure often requires special case code being written by application developers along with special hand holding by operations teams. Dealing with issues related to database replication (whether it is multi-master or single master) also often takes up unexpected amounts of manpower once sharding is involved.

For these reasons I expect we’ll see more large scale websites decide that instead of treating a SQL database as a denormalized key-value pair store that they would rather use a NoSQL database. However I also suspect that a lot of services who already have a sharded relational database + in-memory cache solution can get a lot of mileage from more judicious usage of in-memory caches before switching. This is especially true given that you still caches in front of your NoSQL databases anyway. There’s also the question of whether traditional relational database vendors will add features to address the shortcomings highlighted by the NoSQL movement? Given that the sort of companies adopting NoSQL are doing so because they want to save costs on software, hardware and operations I somehow doubt that there is a lucrative market here for database vendors versus adding more features that the banks, insurance companies and telcos of the world find interesting.

Note Now Playing: Birdman - Money To Blow (featuring Drake & Lil Wayne Note