Monthly Archives: July 2011

Okay, one more poke at SSE2: sorting doubles

Follow-up: source code is on github: ssesort.c. Old-school (pre-CUDA) non-graphic programming of GPU’s dusted off a bunch of classic algorithms that did little or no branching, and no data sharing between processors, but allowed massive parallelism. One of those algorithms … Continue reading

Posted in algorithm | Tagged , | 1 Comment

Radix sort

About a year ago, I decided to take a serious look at radix sort — one of those algorithms that everyone covers in school and then forgets, because everyone knows that it only works for some narrow special cases with … Continue reading

Posted in algorithm, Uncategorized | Tagged , , | Leave a comment

What is SSE !@# good for? Transposing a bit matrix

If you find this interesting, check out my Oct post for the full “C” routine for transposing an arbitrary-sized bit matrix. On an AMD64, it runs about 10-15x the speed of an efficiently-coded non-SSE routine. This is probably my last … Continue reading

Posted in algorithm, Uncategorized | 3 Comments

Convergence: SSE2 and strstr

The original improved strstr routine split the problem up based on the pattern length: 2, 3 and 4+ bytes were separate cases. How about reimplementing the 2- and 3-byte cases using SSE2 functions? The main change is to compare each … Continue reading

Posted in algorithm | Tagged , , | 9 Comments

A better strstr, with no asm code

For the ultimate in kickass strstr, see my later post about using SSE2 instructions to get a 10x speedup on non-SSE2 strstr. Unlike the Intel-only SSE4.2 ops, SSE2 is on every x86-compatible chip since the Pentium-4. My original strstr post … Continue reading

Posted in Uncategorized | Tagged | 2 Comments