Functional Programming in C# 3.0: How Map/Reduce/Filter can Rock your World

June 16, 2008

@ 12:03 PM

As a developer who was raised on procedural and object oriented programming languages like C, C++ and Java it took me a while to figure out what people were raving about when it comes to the benefits of functional programming techniques. I always thought closures and higher order functions were words used by snobby kids from MIT and grad students to show how overeducated they were as opposed to programming tools I'd ever find useful.

This thinking was additionally fueled by articles like Joel Spolsky's Can Your Programming Language Do This? which not only came off as snobby but also cemented the impression that higher order functions like map() and reduce() are for people solving "big" problems like the folks at Google who are trying to categorize the entire World Wide Web not people like me who write desktop feed readers in their free time.

All of this changed when I started learning Python.

With Python I started writing programs that threw around lambda functions and used list comprehensions to map, reduce and filter without even thinking twice about it. Afterwards when I'd go back to programming in C# 2.0 I'd marvel at how much more code it took to get things done. There were tasks which I could perform in a line of Python code that took four, five sometimes up to ten lines of C# code. I began to miss Python sorely.

Then I installed Visual Studio 2008 and got to use the Language Integrated Query (LINQ) features of C# 3.0 and was blown away. The C# folks had not only brought over functional programming constructs like lambda expressions (aka anonymous methods) but also had added the 3 core functions (map, reduce and filter) to all lists, collections and other implementers of the IEnumerable interface. So what are map, reduce and filter? They are higher order functions [which means they take functions as input] that operate on lists of objects. Here are their definitions from the Python documentation along with links to their C# 3.0 equivalents.

Function name in Python Description from Python Documentation C# 3.0 Equivalent

map Apply function to every item of iterable and return a list of the results. Enumerable.Select

reduce (aka fold or accumulate) Apply function of two arguments cumulatively to the items of iterable, from left to right, so as to reduce the iterable to a single value. For example, reduce(lambda x, y: x+y, [1, 2, 3, 4, 5]) calculates ((((1+2)+3)+4)+5). The left argument, x, is the accumulated value and the right argument, y, is the update value from the iterable. If the optional initializer is present, it is placed before the items of the iterable in the calculation, and serves as a default when the iterable is empty. Enumerable.Aggregate

filter Construct a list from those elements of iterable for which function returns true. iterable may be either a sequence, a container which supports iteration, or an iterator. Enumerable.Where

With these three building blocks, you could replace the majority of the procedural for loops in your application with a single line of code. C# 3.0 doesn't just stop there. There are also a number of other useful higher order functions available on all enumerable/collection objects.

In the next version of RSS Bandit, we will support synchronizing your subscription state from Google Reader, NewsGator Online and the Common Feed List provided by the Windows RSS platform. This means that when the user hits [Update All Feeds] to refresh their subscriptions we need to (i) aggregate the unread item count across the different feed sources and store it (ii) ask each feed source to kick off its update process and (iii) on completion of the update determine if there are new items by recalculating the unread count across all feed sources and see if it differs from the value we got in the first step. Here's what the UpdateAllFeeds() method looks like

        public void UpdateAllFeeds(bool force_download)
        {
            List<SubscriptionRootNode> rootNodes = this.GetAllSubscriptionRootNodes();
            if (rootNodes != null)
            {
                if (_timerRefreshFeeds.Enabled)
                    _timerRefreshFeeds.Stop();
                _lastUnreadFeedItemCountBeforeRefresh = rootNodes.Sum(n => n.UnreadCount);
                FeedSources.Sources.ForEach(s => s.RefreshFeeds(force)); 
            }
        }

In the UpdateAllFeeds() method we use Enumerable.Sum which is a specialized reduce() function to calclulate the unread count of each of the different subscription sources. Then we use a ForEach extension method to effectively loop through each feed source and call its RefreshFeeds() method. That would have been two for loops in older versions of C# or Java.

We also perform more complicated reduce or fold operations which go outside the norm of just accumulating some numeric value in RSS Bandit. When a user subscribes to a new feed, we populate a drop down list with the list of categories from the user's subscriptions so the user can decide which category to place the feed in. With multiple feed sources, we need to populate the drop down with the list of categories used in Google Reader, NewsGator, the Windows Common Feed List as well as those within RSS Bandit while taking care to eliminate duplicates. The GetCategories() method shown below does the bulk of that work in a single line of code via Enumerable.Aggregate

public IEnumerable<string> GetCategories()
{
  //list containing default category used for bootstrapping the Aggregate function
  var c = new List<string>();
  c.Add(DefaultCategory);
  IEnumerable<string> all_categories = c; 

  //get a list of the distinct categories used across all feed sources
  all_categories = FeedSources.Sources.Aggregate(all_categories, 
                                 (list, s) => list.Union(s.Source.GetCategories().Keys, StringComparer.InvariantCultureIgnoreCase));                        
  return all_categories;
}

The first step is to set up a list with the default category ("Unclassified") and then use Aggregate() to go through each source and perform a union of the current list of categories with the list of categories from that feed source. The categories are compared in a case insensitive manner to remove duplicates from the union. If there are no categories defined in any of the feed sources then only the default category ends up being returned.

When a user is viewing their Google Reader feeds in RSS Bandit, any action the user takes in the application is reflected on the Web. So each time a user marks an item as read, renames a feed title, subscribes or unsubscribes from a feed, a Web request is made behind the scenes to update the user's state on the Web via Google Reader's REST API. Instead of making the Web requests synchronously and possibly tying up the UI I instead add each Web request intended for the Google Reader API to a queue of pending operations. Since the operations may sit in the queue for a few seconds or minutes in the worst case, we can optimize network usage by removing events from the queue if they end up being redundant.

For example. the DeleteFeedFromGoogleReader() method removes every pending operation related to a particular feed if an unsubscribe event is enqueued. After all, there is no point in making Web requests to mark the feed as read or rename it, if the next request from the user is to unsubscribe from the feed. The method uses a filter operation, Enumerable.Where, to determine the events to remove as shown below

       public void DeleteFeedFromGoogleReader(string googleUserID, string feedUrl)
        {
            var deleteOp = new PendingGoogleReaderOperation(GoogleReaderOperation.DeleteFeed, 
                                                            new object[] {feedUrl},googleUserID);

            lock (pendingGoogleReaderOperations)
            {
                //remove all pending operations related to the feed since it is going to be unsubscribed
                IEnumerable<PendingGoogleReaderOperation> ops2remove 
                  = pendingGoogleReaderOperations.Where(op => op.GoogleUserName.Equals(deleteOp.GoogleUserName)
                                                        && op.Parameters.Contains(feedUrl));

                foreach (PendingGoogleReaderOperation op2remove in ops2remove)
                {
                    pendingGoogleReaderOperations.Remove(op2remove); 
                }

                pendingGoogleReaderOperations.Add(deleteOp);   
            }
        }

There are more examples from the RSS Bandit code base but I'm sure you get the idea. The point is that functional programming techniques give you the ability to get more bang for your buck (where bucks are lines of code) even when performing the most mundane of tasks.

If your programming language doesn't support lambda functions or have map/reduce/filter functions built in, you just might be a Blub Programmer who is missing out on being more productive because your programming language doesn't support "esoteric" or "weird" features.

My next step is to spend more time with Lisp. Wish me luck. :)

Now Playing: Lil Wayne - Lollipop (remix) (feat. Kanye West)

Categories: Programming

« Branding: You Can't Be Everything to Eve... | Home | How Do You Find the Best Employees for Y... »

Monday, 16 June 2008 15:06:58 (GMT Daylight Time, UTC+01:00)

Nice article. I can't wait until they let us use .NET 3.5 and LINQ at work, but for now, I'm stuck on 2.0, and we've only just upgraded from 1.1.

I still wanted to use Functional Programming in .NET, though, so I figured out how to do it in .NET 2.0, using Generics. Filter is pretty much built in using the .FindAll(Predicate<T>) method, but Map and Reduce were incredibly easy themselves.

I wrote an article on it myself a few months ago, which you can find here:

Half-Assed Functional Programming in .NET 2.0

Chris Doggett

Monday, 16 June 2008 16:28:07 (GMT Daylight Time, UTC+01:00)

Dare, how are you working with Google Reader? Is there a published API or are you just reverse engineering it?

Monday, 16 June 2008 16:37:29 (GMT Daylight Time, UTC+01:00)

jb,
Unofficially sanctioned reverse engineering based on the docs at http://code.google.com/p/pyrfeed/wiki/GoogleReaderAP. I say unofficially sanctioned because members of the Google Reader team are aware of this effort and have even fixed issues I've reported in my blog.

Dare Obasanjo

Monday, 16 June 2008 19:53:27 (GMT Daylight Time, UTC+01:00)

In case you were interested, I have started a library on codeplex called Dizzy whose goal is to fill in a lot of the missing higher order methods that Microsoft didn't implement. There are many such as Cycle, Repeat, Chain, Partition, etc... that are very useful.

Justin Etheredge

Monday, 16 June 2008 22:38:48 (GMT Daylight Time, UTC+01:00)

I love the new Enumerable extensions methods, I get so much more done with them. I can write an alg that looks good and easy to read. After it's working if it's not efficient enough because the logic is more complicated then the functions allow I can rewrite it with ease.

BlindWanderer

Tuesday, 17 June 2008 03:32:52 (GMT Daylight Time, UTC+01:00)

Hi Dare,

A few months ago I decided to learn LINQ and I read a lot about people raving about the same functional "features". At some point I thought, hey if people think LINQ is great because of those features then why not go all out and do functional programming. And that's how I found a F#!

Erik

Tuesday, 17 June 2008 08:37:51 (GMT Daylight Time, UTC+01:00)

I love to see posts like this! Dare, if you're serious about the LISP thing, visit http://mitpress.mit.edu/SICP/ and read what those snobby kids from MIT did their freshman year when they were all noobs just like the rest of us. ;)

Joe Chung

Tuesday, 17 June 2008 10:09:46 (GMT Daylight Time, UTC+01:00)

So true. I use C++ for work, and C# 3.0 for hobby. I've been slowly learning LINQ for objects and holy cow it makes writing algorithms so much simpler. Those algorithms that scan a list for the smallest or largest item? You know -- where you have loops that track which index was the previous smallest itme -- now trivial usign the orderby operator in LINQ. The code is much simpler to read as well.

I see C++ is adding features like lambda's to the language (at least, it is proposed). So perhaps we'll start to get libraries that do the same. It won't be quite as elegant as python or C# 3.0 - but it is getting there.

[My day job will never switch from C++ - 10 year old code base, needs to last another 20 years or so - particle physics experiment (ATLAS) - so don't ask :-)].

Gordon Watts

Tuesday, 17 June 2008 17:35:43 (GMT Daylight Time, UTC+01:00)

I am an amateur Lisp programmer and I highly recommend the book by Peter Seibel as a first text on Lisp. Practical Common Lisp.

Jonas Gorauskas

Comments are closed.

Dare Obasanjo's weblog

"You can buy cars but you can't buy respect in the hood" - Curtis Jackson

Navigation for Functional Programming in C# 3.0: How Map/Reduce/Filter can Rock your World - Dare Obasanjo's weblog

Function name in Python	Description from Python Documentation	C# 3.0 Equivalent
map	Apply `function` to every item of `iterable` and return a list of the results.	Enumerable.Select
reduce (aka fold or accumulate)	Apply `function` of two arguments cumulatively to the items of `iterable`, from left to right, so as to reduce the iterable to a single value. For example, `reduce(lambda x, y: x+y, [1, 2, 3, 4, 5])` calculates `((((1+2)+3)+4)+5)`. The left argument, `x`, is the accumulated value and the right argument, `y`, is the update value from the `iterable`. If the optional `initializer` is present, it is placed before the items of the iterable in the calculation, and serves as a default when the iterable is empty.	Enumerable.Aggregate
filter	Construct a list from those elements of `iterable` for which `function` returns true. `iterable` may be either a sequence, a container which supports iteration, or an iterator.	Enumerable.Where