## SSE2 bit shift

For some reason, there are a ridiculously many google hits on this blog, for the string “SSE2 bit shift”. I didn’t post verbatim code for this, just cpp hints on addressing the silly const-arg-ness of counts for SSE2 shift operators. I’m really curious if people are coming here and not seeing what they need. Drop me a comment if you did.

And since it’s Christmas, here’s the perl code that generates the “C” code for a left shift with a variable bit count. The fastest code has a 129-way switch statement. The perl code generates different sequences of 1,2,5 or 6 register instructions, implementing the various shifts. The following are the kinds of cases it generates:

case 0: break;
case 1: x = _mm_or_si128(_mm_slli_epi64(x, 1),
_mm_srli_epi64(_mm_slli_si128(x, 8), 64-1));
break;
...
case 8: x = _mm_slli_si128(x, 1); break; // multiples of 8
case 9: x = _mm_or_si128(_mm_slli_epi64(_mm_slli_si128(x, 9/8), 9%8),
_mm_srli_epi64(_mm_slli_si128(x, 8+9/8), 64-9%8))
...
case 65: x = _mm_slli_epi64(_mm_slli_si128(x, 65/8), 65%8); break;
...
default: x = _mm_setzero_si128(); break;

… and the perl code is:

#include <emmintrin.h>
#define shl _mm_slli_epi64
#define shr _mm_srli_epi64
#define SHL _mm_slli_si128
#define C1(n) x = SHL(x, n/8)
#define C2(n) x = shl(SHL(x, n/8), n%8)
#define C5(n) x = _mm_or_si128(shl(x, n%8), shr(SHL(x, 8), 64-n%8))
#define C6(n) x = _mm_or_si128(shl(SHL(x, n/8), n%8), shr(SHL(x, 8+n/8), 64-n%8))

__m128i xm_shl(__m128i x, unsigned nbits)
{
switch (nbits) {
case 0: break;

print "\tcase \$_: C".(\$_<8 ? 5 : \$_%8 ? \$_<64 ? 6 : 2 : 1)."(\$_); break;\n"
for 1..127;
print <<'__FOOT';
default: x = _mm_setzero_si128();
}
return x;
}
__FOOT

Good luck, and Happy New Year’s!