Category Archives: algorithm

Bug fix for ACISM

Several people have used the Aho-Corasick interleaved state-transition matrix code, only to find it didn’t match all strings (!) The bug was in the machine construction: the “prune backlinks” step in acism_create is (as-is) broken. I commented it out, and … Continue reading

Posted in aho-corasick, algorithm, string search | Leave a comment

SSE2 bit matrix transpose special case … 8 x 256 … for Marek

It turns out that Marek’s Idea of the Day: Bitsliced SipHash used my SSE2 bit-matrix transpose┬ároutine, but it wasn’t fast enough. This is normally the case for SSE2: the more specific the problem, the better the code can be. I … Continue reading

Posted in algorithm, bit, bit shift, SSE2 | Tagged , , | Leave a comment

SSE2 and BNDM string search

For the past few weeks, I’ve been testing and experimenting with the Railgun string search function written by Sanmayce. Railgun is really a “memmem” function, where the target length is known in advance; and the cost of compiling the pattern … Continue reading

Posted in algorithm, SSE2, string search | Tagged , , , , , , | 8 Comments

Update on bitonic SSE2 sort of 16 doubles

For the complete source code for both sorting and ranking functions using SSE2, check out ssesort.c in this github repo I originally used asm to generate the bitonic sorter. After doing a little more testing, I found that gcc 4.4 … Continue reading

Posted in algorithm | Tagged , , , | 5 Comments

Okay, one more poke at SSE2: sorting doubles

Follow-up: source code is on github: ssesort.c. Old-school (pre-CUDA) non-graphic programming of GPU’s dusted off a bunch of classic algorithms that did little or no branching, and no data sharing between processors, but allowed massive parallelism. One of those algorithms … Continue reading

Posted in algorithm | Tagged , | 1 Comment

Radix sort

About a year ago, I decided to take a serious look at radix sort — one of those algorithms that everyone covers in school and then forgets, because everyone knows that it only works for some narrow special cases with … Continue reading

Posted in algorithm, Uncategorized | Tagged , , | Leave a comment

What is SSE !@# good for? Transposing a bit matrix

If you find this interesting, check out my Oct post for the full “C” routine for transposing an arbitrary-sized bit matrix. On an AMD64, it runs about 10-15x the speed of an efficiently-coded non-SSE routine. This is probably my last … Continue reading

Posted in algorithm, Uncategorized | 1 Comment