SSE2 bit shift

For some reason, there are a ridiculously many google hits on this blog, for the string “SSE2 bit shift”. I didn’t post verbatim code for this, just cpp hints on addressing the silly const-arg-ness of counts for SSE2 shift operators. I’m really curious if people are coming here and not seeing what they need. Drop me a comment if you did.

And since it’s Christmas, here’s the perl code that generates the “C” code for a left shift with a variable bit count. The fastest code has a 129-way switch statement. The perl code generates different sequences of 1,2,5 or 6 register instructions, implementing the various shifts. The following are the kinds of cases it generates:

case 0: break;
case 1: x = _mm_or_si128(_mm_slli_epi64(x, 1), 
                         _mm_srli_epi64(_mm_slli_si128(x, 8), 64-1));
    break;
...
case 8: x = _mm_slli_si128(x, 1); break; // multiples of 8
case 9: x = _mm_or_si128(_mm_slli_epi64(_mm_slli_si128(x, 9/8), 9%8),
                         _mm_srli_epi64(_mm_slli_si128(x, 8+9/8), 64-9%8))
...
case 65: x = _mm_slli_epi64(_mm_slli_si128(x, 65/8), 65%8); break;
...
default: x = _mm_setzero_si128(); break;

… and the perl code is:

print <<'__HEAD';
#include <emmintrin.h>
#define shl _mm_slli_epi64
#define shr _mm_srli_epi64
#define SHL _mm_slli_si128
#define C1(n) x = SHL(x, n/8)
#define C2(n) x = shl(SHL(x, n/8), n%8)
#define C5(n) x = _mm_or_si128(shl(x, n%8), shr(SHL(x, 8), 64-n%8))
#define C6(n) x = _mm_or_si128(shl(SHL(x, n/8), n%8), shr(SHL(x, 8+n/8), 64-n%8))

__m128i xm_shl(__m128i x, unsigned nbits)
{
switch (nbits) {
case 0: break;
__HEAD

print "\tcase $_: C".($_<8 ? 5 : $_%8 ? $_<64 ? 6 : 2 : 1)."($_); break;\n"
  for 1..127;
print <<'__FOOT';
default: x = _mm_setzero_si128();
}
return x;
}
__FOOT

Good luck, and Happy New Year’s!

Advertisements

About mischasan

I've had the privilege to work in a field where abstract thinking has concrete value. That applies at the macro level --- optimizing actions on terabyte database --- or the micro level --- fast parallel string searches in memory. You can find my documents on production-system radix sort (NOT just for academics!) and some neat little tricks for developers, on my blog https://mischasan.wordpress.com My e-mail sig (since 1976): Engineers think equations approximate reality. Physicists think reality approximates the equations. Mathematicians never make the connection.
This entry was posted in bit, bit shift, SSE2, Uncategorized. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s