Category Archives: SSE2

“Unusual uses of SSE2” posted to github

In this month’s frenzy of putting source code out there in a usable form, I’ve posted source to github for the SSE2 implementations of string search, BNDM search, sorting [16] doubles, and bit-matrix transpose; plus some convenience tools for SSE2. … Continue reading

Posted in bit, bit matrix transpose, bit shift, ffs, SSE2, string search, Uncategorized | 2 Comments

SSE2 bit matrix transpose special case … 8 x 256 … for Marek

It turns out that Marek’s Idea of the Day: Bitsliced SipHash used my SSE2 bit-matrix transpose┬ároutine, but it wasn’t fast enough. This is normally the case for SSE2: the more specific the problem, the better the code can be. I … Continue reading

Posted in algorithm, bit, bit shift, SSE2 | Tagged , , | Leave a comment

SSE2 beats SSE4.2 in memcmp?

At the moment I haven’t any box where I can test the latest GCC compilers and SSE4.2 support (pcmpestri etc). So far, the following beats gcc 4.4 with -march=corei7 -msse4.2 (okay, perhaps that’s redundant :-). But gcc generates “repz cmpsb” … Continue reading

Posted in ffs, SSE2, SSE4.2, string search | 2 Comments

SSE2 bit shift

For some reason, there are a ridiculously many google hits on this blog, for the string “SSE2 bit shift”. I didn’t post verbatim code for this, just cpp hints on addressing the silly const-arg-ness of counts for SSE2 shift operators. … Continue reading

Posted in bit, bit shift, SSE2, Uncategorized | Leave a comment

SSE2 and BNDM string search

For the past few weeks, I’ve been testing and experimenting with the Railgun string search function written by Sanmayce. Railgun is really a “memmem” function, where the target length is known in advance; and the cost of compiling the pattern … Continue reading

Posted in algorithm, SSE2, string search | Tagged , , , , , , | 8 Comments

The Generic SSE2 Loop

In response to a couple of comments on my post about find-first-bit-set in SSE2 registers, amounting to “what use is a routine that only does 16-byte bitvecs”, I thought I’d post the canonic, generic loop through memory using SSE2 ops. … Continue reading

Posted in ffs, SSE2, Uncategorized | Tagged , , , , | 5 Comments