Why is Quicksort that quick?

Quicksort has a reputation for speed. In school, you learn that it is O(n log n) … the optimum for all single-threaded, comparison-based sorts. In fact, when combined with median-of-three pivot selection, tiny-partition insertion-sort, and special equal-keys handling (qv Top 3 Quicksort optimizations) it is faster than its closest comparison-based competitors: Merge sort and Heapsort. (That doesn’t apply to near-sorted input, but that’s another story). Timsort makes the best of Merge sort, with similar extensions, to cover more edge cases than the above quicksort add-ons.

But it puzzled me that Quicksort was often much faster than, say, a heap-with-hole sort — an optimization of simple Heapsort. Then, data-oriented-design ideas rang a bell: real-world keys are often complex data structures: pointers to key-parts scattered across the heap. Getting all the parts from RAM to cache is a dominating cost. And here are some pertinent features of Quicksort:

  • all comparisons are against one (pivot) key. No matter how scattered the pivot key parts are, in RAM, it is guaranteed they are in L1 cache.
  • as Quicksort recurses into smaller and smaller partitions, it eventually reaches one small enough that all scattered key parts are in cache. From there on down, every child recursion will be using keys completely in cache. This is repeatedly true as Quicksort works its way down through the sizes of L3/L2/L1 cache; and applies to the bottom-level insertion sorts.

Next up: how radix sort can beat Quicksort on its home (cache) turf, for real-world (complex) keys.

Advertisements
Posted in algorithm, sort, Uncategorized | Leave a comment

Bug fix for ACISM

Several people have used the Aho-Corasick interleaved state-transition matrix code, only to find it didn’t match all strings (!)
The bug was in the machine construction: the “prune backlinks” step in acism_create is (as-is) broken. I commented it out, and voilà it works.
I will update github once I’ve sorted out an SSL cert issue 😛
Thanks for the patience of those who took that code and stumbled on that bug; and special thanks to perfgao@gmail.com, who came up with the tiny example that identified the problem.

FWIW the prune-backlinks code will return; but it will check that the subtrees of targets of backlinks, not just the immediate children, must be a subset of the subtree of the source, in order to prune the link.

Posted in aho-corasick, algorithm, string search | Leave a comment

GMAKE cheap trick #4 for non-recursive make

One of the tedious sides of non-recursive make is that all makefiles share the same namespace. Every variable or pseudo-target name must be unique across all subprojects, or must be a well-known global variable/target, to which a subproject’s bits and pieces are appended. Appending to global variables doesn’t alway have the desired effect, either.

For example, these kinds of rules/assignments work (assuming EXPORT_PGMS is used and owned by global_rules.mk):

include global_rules.mk
EXPORT_PGMS += rsaencrypt
RSAENCRYPT_testfiles = foo.dat bar.dat

all : rsaencrypt
clean .PHONY: rsaencrypt.clean
rsaencrypt.clean :; rm -f ${RSAENCRYPT_testfiles }

But these do not:

CFLAGS = -g # problem: previous CFLAGS wiped out
CFLAGS += -Imock # oops: now ALL subprojects see this flag
clean:; rm -f test.out # fails; "clean:" actions can only be defined once
clean::; rm -f test.out # fails if "clean:" is defined anywhere.

A second tedious side is that the current working directory will not be the one where the Makefile and source files live. That means that every file reference must have a directory path. The obvious change makes the makefile harder to read, and hence more prone to mistakes …

rsaencrypt.source = ${rsaencrypt}/alpha.c ${rsaencrypt}/beta.c ${rsaencrypt}/gamma.c \
                    ${rsaencrypt}/delta.c ${rsaencrypt}/beta.c ${rsaencrypt}/epsilon.c \
                    ${rsaencrypt}/digamma.c 

One approach is:

rsaencrypt.source = alpha.c beta.c gamma.c delta.c epsilons.c digamma.c
rsaencrypt_source.zip : ${rsaencrypt.source:%=${mydir}/%}
# ... or without the extra variable:
rsaencrypt_source.zip : $(addprefix ${rsaencrypt}/, alpha.c beta.c gamma.c delta.c epsilons.c digamma.c)

However, there’s a little abbreviation trick that can help. All of my makefiles begin like this:

~ := rsaencrypt
. := $(word 1,${$_} .)

There’s nothing special about the choice of characters; any of $! $+ $- $, $` $_ will also work. The object is to have an unobtrusive single-character abbreviation, avoiding letters, digits and gmake’s own $@ $< $^ $* : = # .

The $(word...) expression sets $. to . if no parent makefile has set ${rsaencrypt}. Long variable names that are just following cross-project conventions can be prefixed with “$~…”:

$~source = $./alpha.c $./beta.c $./gamma.c $./delta.c $./epsilons.c $./digamma.c
$~source.zip : ${$~source}

Note the :=, not =. That means that, as sub-makefiles are included, the definitions of $~ and $. change from that point on. Which also means that include ${subdir}/submakefile lines must come at the bottom of the makefile, and successive include directives can’t use $. for the pathname. Well, nothing’s perfec. 🙂

Posted in make, non-recursive make | Leave a comment

GMAKE cheap trick #3

What’s the shortest way to define a GMAKE var that is exactly one space?

SPACE = $~ #

Posted in make, Uncategorized | Leave a comment

Quick update on ACISM

Thanks to Nicholas Sinnott-Armstrong, who suggested that the fast and compact Aho-Corasick interleaved state machine, on github could use an upgrade from 32-bit to 64-bit, to handle (say) over 40M strings — a bit beyond 32-bit ACISM’s 16MB limit. He tried doing that, and with a small performance tweak, I’ve updated github. The state machine is twice as large, of course. On a 64-bit processor, it should be a couple cycles faster, though, because the shifts and masks are now constants. Anybody else using ACISM for pattern sets that large ??

Posted in Uncategorized | 3 Comments

21st Century Programming Languages

When I was a co-op student, I programmed in a language that had lambdas, currying, user-defined operators (including assignment), dynamic compilation, and a wealth of other features. It was neat, efficient and readable. That languages was POP-2.

It was written in 1970.

Posted in Uncategorized | Tagged , | Leave a comment

“Unusual uses of SSE2” posted to github

In this month’s frenzy of putting source code out there in a usable form, I’ve posted source to github for the SSE2 implementations of string search, BNDM search, sorting [16] doubles, and bit-matrix transpose; plus some convenience tools for SSE2.
http://github.com/mischasan/sse2

Posted in bit, bit matrix transpose, bit shift, ffs, SSE2, string search, Uncategorized | 2 Comments