About a year ago, I decided to take a serious look at radix sort — one of those algorithms that everyone covers in school and then forgets, because everyone knows that it only works for some narrow special cases with lots of edges.
Turns out that you can make radix sort very general, with base behaviour equal to production-quality quicksort, and typical speed around five times that of quicksort. In a nutshell, it can be done with a MSB-first radix sort, that scans keys 16 bytes at a time, building statistics, then performs the usual radix-sort scatter and gather passes. Each pass splits a set of runs into a larger set of smaller runs, and very small runs are sorted with insertion sort.
For a full description with pictures, check out this GoogleDoc. Still haven’t decided whether to release the source code under GPL or BSD license; more to come.