Jeff Atwood has a blog post entitled Sorting for Humans : Natural Sort Order where he writes

The default sort functions in almost every programming language are poorly suited for human consumption. What do I mean by that? Well, consider the difference between sorting filenames in Windows explorer, and sorting those very same filenames via Array.Sort() code:

Implementing a natural sort is more complex than it seems, and not just for the gnarly i20n issues I've hinted at, above. But the Python implementations are impressively succinct

I tried to come up with a clever, similarly succinct C# 3.0 natural sort implementation, but I failed. I'm not interested in a one-liner contest, necessarily, but it does seem to me that a basic natural sort shouldn't require the 40+ lines of code it takes in most languages.

Since I’m still in my “learning Python by mapping it to C#” phase I thought this should be a straightforward task. Below is the equivalent IronPython code for natural sort which is slightly modified from the code posted in Jeff’s post along with what I hoped to be a succint version in C# 2.0. It would definitely be shorter in C# 3.0 [which I don’t plan to start using for another year or so]. The Python snippet below takes advantage of some interesting rules around comparing lists of objects which don’t exist in C#. I’m sure I could reduce the size of the C# code while maintaining readability but my procrastination time is over and I need to get to work. Wink

Natural Sort in IronPython

import re

def sort_nicely( l ):
  """ Sort the given list in the way that humans expect. """
   
  convert = lambda x: x.isdigit() and int(x) or x
  alphanum = lambda key: [ convert(c) for c in re.split('([0-9]+)', key) ]
  l.sort( key=alphanum ) #serious magic happens here
  return l

print sort_nicely(["z22.txt", "z5.txt" , "z.txt", "z10.txt", "z300.txt", "z2.txt", "z11.txt", "y.txt", "z", "z4.txt", "za.txt" ])

Natural Sort in C# 2.0


using System;
using System.Collections.Generic;
using System.Text.RegularExpressions;


public class Test {


   ///<summary>Compare two lists of strings using Python rules and natural order semantics</summary>
  public static int NaturalCompare(IList<string> a, IList<string> b) {
    int y, z, len = (a.Count < b.Count ? a.Count : b.Count);

    for (int i = 0; i < len; i++) {
      if (a[i].Equals(b[i])) continue;

      bool w = Int32.TryParse(a[i], out y), x = Int32.TryParse(b[i], out z);
      bool bothNumbers = w && x, bothNotNumbers = !w && !x;

      if (bothNumbers) return y.CompareTo(z);
      else if (bothNotNumbers) return a[i].CompareTo(b[i]);
      else if (w) return -1;
      else return 1; //numbers always less than words or letters
    }
    return (a.Count.CompareTo(b.Count)); //subset list is considered smaller 
  }

  public static List<string> SortNicely(List<string> list) {
    Regex re
= new Regex("([0-9]+)");
    list
.Sort(delegate(string x, string y) { return NaturalCompare(re.Split(x), re.Split(y)); });
    return list;
  }


  public static void Main(string[] args) {
    List<string> l = new List<string>(new string[] { "z.txt", "y.txt", "z22.txt", "z5.txt", "z10.txt", "z3.txt", "z2.txt", "za.txt", "z11.txt", "z400.txt" });
    foreach (string s in SortNicely(l)) Console.WriteLine(s);
  }
}

Now playing: Notorious B.I.G. - Real N*ggas Do Real Things


 

Thursday, 13 December 2007 20:10:24 (GMT Standard Time, UTC+00:00)
I thought that was a Jay-Z song.
Thursday, 13 December 2007 20:27:53 (GMT Standard Time, UTC+00:00)
Jay-Z reused a hook from Biggie's underground days. You should listen to Camron's "Biter" for all the Biggie lines Jigga has stolen across his illustrious career.
Thursday, 13 December 2007 23:04:23 (GMT Standard Time, UTC+00:00)
Hi Dare,

Just in case you're interested, here's my own attempt at approximating "natural sort" in one line of C# 3.0 :

public static List&lt;string> SortNicely(List&lt;string> list)
{
var re = new Regex("[0-9]+");
return list.OrderBy(s => re.Replace(s, m => m.Value.PadLeft(8, '0'))).ToList();
}

It's an approximation in the sense that it gives up (gracefully) on sorting numbers that are more than 8 digits long, but I don't think this could be an issue in a real-world situation... and your C# code doesn't handle them either :)

Would you care to elaborate on why you don't plan to use C# 3.0 for a year ? Given that there is no runtime issue... even code like I wrote above can run on a vanilla 2.0 system with a tiny bit of help from LINQBridge !

Cheers,
--Jonathan
Friday, 14 December 2007 03:33:29 (GMT Standard Time, UTC+00:00)
Using a larger font face for C#... sneaky.
Mike
Saturday, 15 December 2007 06:38:19 (GMT Standard Time, UTC+00:00)
just some note on the python version:
* as you use regular expressions, it's better to compile them, 'cause they will be faster if you do more than one sort
* the method list.sort modify the list in place! use sorted(list) instead
* this is a matter of style, but I prefer
value if cond else othervalue
instead of short-circuit evaluation...
so, width little touch to your function, I give you this one-liner :D

import re

def sort_nicely(l, splitter=re.compile("(\d+)")):
""" Sort the given list in the way that humans expect. """
return sorted(l, key=lambda name: (
int(el) if el.isdigit() else el for el in splitter.split(name)))

l = ["z22.txt", "z5.txt", "z.txt", "z10.txt", "z300.txt", "z2.txt", "z11.txt",
"y.txt", "z", "z4.txt", "za.txt"]
print sort_nicely(l)
ZeD
Sunday, 16 December 2007 14:40:40 (GMT Standard Time, UTC+00:00)
Ian Griffiths also took a stab at it here:
http://www.interact-sw.co.uk/iangblog/2007/12/13/natural-sorting
Dilip
Comments are closed.