In my previous post, I mentioned that I'm in the early stages of building an application on the Facebook platform. I haven't yet decided on an application but for now, let's assume that it is a Favorite Comic Books application which allows me to store my favorite comic books and shows me to most popular comic books among my friends.

After investigating using Amazon's EC2 + S3 to build my application I've decided that I'm better off using a traditional hosting solution running either a on the LAMP or WISC platform. One of the things I've been looking at is which platform has better support for providing an in-memory caching solution that works well in the context of a Web farm (i.e. multiple Web servers) out of the box. While working on the platforms behind several high traffic Windows Live services I've learned  that you should be prepared for dealing with scalability issues and caching is one of the best ways to get bang for the buck when improving the scalability of your service.

I recently discovered memcached which is a distributed, object caching system originally developed by Brad Fitzpatrick of LiveJournal fame. You can think of memcached as a giant hash table that can run on multiple servers which automatically handles maintaining the balance of objects hashed to each server and transparently fetches/removes objects from over the network if they aren't on the same machine that is accessing an object in the hash table. Although this sounds fairly simple, there is a lot of grunt work in building a distributed object cache which handles data partitioning across multiple servers and hides the distributed nature of the application from the developer. memcached is a well integrated into the typical LAMP stack and is used by a surprising number of high traffic websites including Slashdot, Facebook, Digg, Flickr and Wikipedia. Below is what C# code that utilizes memcached would look like sans exception handling code

public ArrayList GetFriends(int user_id){

    ArrayList friends = (ArrayList) myCache.Get("friendslist:" + userid);

    if(friends == null){
        // Open the connection
        dbConnection.Open();

        SqlCommand cmd = new SqlCommand("select friend_id from friends_list where owner_id=" + "user_id", dbConnection);

        SqlDataReader reader = cmd.ExecuteReader();

        // Add each friend ID to the list
        while (reader.Read()){
            friends.Add(rdr[0]);
        }

	reader.Close();
       dbConnection.Close();                 

        myCache.Set("friendslist:" + userid, friends);
    }

    return friends;
}

public void AddFriend(int user_id, int new_friends_id){

    // Open the connection
    dbConnection.Open();

    SqlCommand cmd = new SqlCommand("insert into friends_list (owner_id, friend_id) values (" + user_id + "," + new_friend_id ")";
   cmd.ExecuteNonQuery();

    //remove key from cache since friends list has been updated
    myCache.Delete("friendslist:" + userid);

    dbConnection .Close(); 
}

The benefits of the using of the cache should be pretty obvious. I no longer need to hit the database after the first request to retrieve the user's friend list which means faster performance in servicing the request and less I/O.  The memcached automatically handles purging items out of the cache when it hits the size limit and also deciding which cache servers should hold individual key<->value pairs.

I hang with a number of Web developers on the WISC platform and I don't think I've ever heard anyone mention memcached or anything like it.In fact I couldn't find a mention of it on Microsoft employee blogs, ASP.NET developer blogs or on MSDN. So I wondered what the average WISC developer uses as their in-memory caching solution.

After looking around a bit, I came to the conclusion that most WISC developers use the built-in ASP.NET caching features. ASP.NET provides a number of in-memory caching features including a Cache class which provides a similar API to memcached, page directives for caching portions of the page or the entire page and the ability to create dependencies between cached objects and the files or database tables/rows that they were populated from via the CacheDependency and SqlCacheDependency classes. Although some of these features are also available in various Open Source web development frameworks such as Ruby on Rails + memcached, none give as much functionality out of the box as ASP.NET or so it seems.

Below is what the code for the GetFriends and AddFriend methods would look like using the built-in ASP.NET caching features

public ArrayList GetFriends(int user_id){

    ArrayList friends = (ArrayList) Cache.Get("friendslist:" + userid);

    if(friends == null){
        // Open the connection
        dbConnection.Open();

        SqlCommand cmd = new SqlCommand("select friend_id from friends_list where owner_id=" + "user_id", dbConnection);

        SqlCacheDependency dependency = new SqlCacheDependency(cmd);
        SqlDataReader reader = cmd.ExecuteReader();

        // Add each friend ID to the list
        while (reader.Read()){
            friends.Add(rdr[0]);
        }

        reader.Close();
        dbConnection.Close();

        //insert friends list into cache with associated dependency
        Cache.Insert("friendslist:" + userid, friends, dependency);
    }
    return friends;
 }

public void AddFriend(int user_id, int new_friends_id){
    // Open the connection
    dbConnection.Open();

    SqlCommand cmd = new SqlCommand("insert into friends_list (owner_id, friend_id) values (" + user_id + "," + new_friend_id ")";
    cmd.ExecuteNonQuery();

    /* no need to remove from cache because SqlCacheDependency takes care of that automatically */
    // Cache.Remove("friendslist:" + userid);

    dbConnection .Close();
}

Using the SqlCacheDependency class gets around a significant limitation of the ASP.NET Cache class. Specifically, the cache is not distributed. This means that if you have multiple Web front ends, you'd have to write your own code to handle partitioning data and invalidating caches across your various Web server instances. In fact, there are numerous articles showing how to implement such a solution including Synchronizing the ASP.NET Cache across AppDomains and Web Farms by Peter Bromberg and Use Data Caching Techniques to Boost Performance and Ensure Synchronization by David Burgett.

However, let's consider how how SqlCacheDependency is implemented. If you are using SQL Server 7 or SQL Server 2000, then your ASP.NET process polls the database at regular intervals to determine whether the target(s) of the original query have changed. For SQL Server 2005, the database can be configured to send change notifications to the Web servers if the target(s) of the original query change. Either way, the database is doing work to determine if the data has changed. Compared to the memcached this still doesn't seem as efficient as we can get if we want to eke out every last out of performance out of the system although it does lead to simpler code.

If you are a developer on the WISC platform and are concerned about getting the best performance out of your Web site, you should take a look at memcached for Win32. The most highly trafficked site on the WISC platform is probably MySpace and in articles about how they are platform works such as Inside MySpace.com they extol the virtues of moving work out of the database and relying on cache servers.


 

Thursday, 05 July 2007 04:40:00 (GMT Daylight Time, UTC+01:00)
I filed a bug/suggestion for ASP.Net to provide a memcached caching model way back in August 2004; unfortunately they decided it was too much work for .NET 2.0.

The original feedback item is here:
https://connect.microsoft.com/VisualStudio/feedback/ViewFeedback.aspx?FeedbackID=102245

Maybe for .NET 4.0?
Oren Novotny
Thursday, 05 July 2007 15:04:55 (GMT Daylight Time, UTC+01:00)
You might want to go and ask some of the old COM+ guys about why the IMDB was cut between b3 and RTM of Windows 2000...
JonK
Thursday, 05 July 2007 16:19:24 (GMT Daylight Time, UTC+01:00)
Dare - I found two references to your blog in one day - and came to visit - some good stuff here - but *please* can you update the style sheet of your site to allow browser selectable font size. Your font size is painful to read on a high-res monintor... :-) (never mind the accessibility issues).
Thursday, 05 July 2007 16:20:25 (GMT Daylight Time, UTC+01:00)
"the virtues of moving work out of the database and relying on cache servers."

I wonder if use of cache servers may be overdone in many cases. Caching layers, especially nested caching layers, can add a lot of complexity due to cache consistency issues. Messing that up can reduce reliability.

Moreover, some of these architectures appear to use very large numbers of cache servers. If those machines were devoted to database shards instead, each database shard would have less data, so much more of the active set would be in in memory, possibly running so much faster that the caching layer would become largely unnecessary.

In general, I wonder if many are working too hard to avoid accessing the database. In a lot of these situations, I suspect massively distributing the database would be more appropriate.
Thursday, 05 July 2007 16:56:54 (GMT Daylight Time, UTC+01:00)
NHibernate has support for memcached baked in :-)
Anon
Thursday, 05 July 2007 17:45:07 (GMT Daylight Time, UTC+01:00)
There's also at least one commercial distributed caching system for .NET.

http://www.scaleoutsoftware.com/products/stateServer/index.html
Thursday, 05 July 2007 19:45:49 (GMT Daylight Time, UTC+01:00)
I was just researching this topic and it showed up on programming.reddit.com. Great timing.

Have you actually implemented the memcached solution? I have a site with one server that is beginning to pick up traffic. We use sql 2005 so we can take advantage of query notifications. However, there are various issues with query invalidation and certain rules (no select *, top, outer joins, etc) you must follow to get the query notifications to work.

Memcached might be able to take care of a lot of that but I would like to hear more about your experience (or any asp.net developer's experience) with memcached.

The asp.net caching solution works pretty well at this point but I can see advantages in using memcached.
Friday, 06 July 2007 01:56:51 (GMT Daylight Time, UTC+01:00)
My estimates is around a free month, to sit down and write up a piece that will wipe NHibernate rountripping bottleneck of trash design off the floor.

Memcache included, and you get to hit the database like you should.

Roberta
Friday, 06 July 2007 08:48:02 (GMT Daylight Time, UTC+01:00)
@Greg,

You may want to read http://danga.com/words/2005_oscon/oscon-2005.pdf. It explains why there are less db servers and more memcached servers later in the life of a massive web application.
Friday, 06 July 2007 23:38:57 (GMT Daylight Time, UTC+01:00)
You really can't even compare memcached to asp.net caching - one caters to distributed caching, while the other focuses on local caching so that it can deal with things like dependencies.

MySpace actually evaluated memcached when they were investigating this area, but eventually rolled their own using the same concepts as memcached. There needs were a little more customized then what the tools offered out of the box.

You're definitely right about memcached not offering the same level of features as ASP.NET caching, but that's because it's an entirely different kind of tool. All it does is store key/value pairs and retrieve data by key, like a Hashtable in .net.

Of course, it has the bigger upside of being distributed, horizontally scalable, and it wont duplicate data across each machine like asp.net.

The most important thing to remember when using memcached with asp.net is to have a great library. There are a few open source ones out there, but I had to write my own because I just wasn't getting the performance I needed out of them. The biggest bottleneck's when using it will be 1) network latency and 2) serialization.

I highly, highly, highly encourage you to roll your own serialization and use byte arrays - my current library is 3-5x faster then asp.net binary serialization, and over 100x faster when de-serializing.

If done properly, expect to see as much as 95% of your database traffic disappear, depending on how insert/update heavy your app is.
Monday, 09 July 2007 09:34:25 (GMT Daylight Time, UTC+01:00)
In the Java world, the main commercial solution is Tangosol's Coherence.
Tuesday, 10 July 2007 08:26:34 (GMT Daylight Time, UTC+01:00)
If you want the automatic distribution and load-balancing features with the ASP.Net Cache model, you should REALLY look into nCache from Alachisoft. This is serious stuff, that Microsoft really ought to buy.

http://www.alachisoft.com/ncache/index.html

disclaimer: Just a user of this awesome tool.
Friday, 13 July 2007 18:00:29 (GMT Daylight Time, UTC+01:00)
Dare,

You are looking in the wrong places:

http://svn.castleproject.org:8080/svn/castlecontrib/caching/trunk/src/Castle.Components.Cache.Memcached/

.NET 2.0 introduced a provider model for many things in ASP.NET that were not extensible in 1.1. I don't think cache was one of them and it is a DAMN SHAME!

The ideal would be if ASP.NET cache were a provider model implementation just like the Membership and Role stuff, just like ADO.NET. I should be able to implement ICache and replace the default ASP.NET backing store. Instead ASP.NET Cache is black magic with in process and Service Cache as well as ability to use SQL Server as the cache (at which point I'm not sure what the point is).

I always wanted to write a cache provider that uses memcached as the backend. It has been on my freetime todo list for 2 years. I was happy when I saw ncache existed, but with memcached being free, ncache hardly makes sense for many of small applications.

Thanks for your post and brining more awareness to memcached in the .net world.
Comments are closed.