SSE2 bit matrix transpose special case … 8 x 256 … for Marek

It turns out that Marek’s Idea of the Day: Bitsliced SipHash used my SSE2 bit-matrix transpose routine, but it wasn’t fast enough. This is normally the case for SSE2: the more specific the problem, the better the code can be. I have no way of contacting Marek, but, if you see this link, here’s the optimal code for transposing an 8 x 256 bit matrix. It cuts the number of operations in half. The trick is, knowing that there are only 8 rows, the XMM operations can transpose two sets of 8 bytes simultaneously. The change is worth adding as a special case to the full transpose routine, but for simplicity’s sake, here’s the part Marek could use:

void bitmat_trans_8x256(char *inp, char *out)
    uint16_t h, *hinp = (uint16_t*)inp;
    union { __m128i x; uint8_t b[16]; } tmp;
    int r, c;

    for (c = 0; c < 16; ++c) {
        for (r = 0; r < 7; ++r) {
            tmp.b[r] = h = hinp[r * 16 + c];
            tmp.b[r + 8] = h >> 8;

        for (r = 0; r < 7; ++r) {
            out[c * 16 + r] = h = _mm_movemask_epi8(tmp.x);
            out[c * 16 + r + 8] = h >> 8;
            tmp.x = _mm_slli_epi64(tmp.x, 1);

Compiled on a Core7 using gcc 4.4.6, it amounts to a 70-instruction loop repeated 16 times.


About mischasan

I've had the privilege to work in a field where abstract thinking has concrete value. That applies at the macro level --- optimizing actions on terabyte database --- or the micro level --- fast parallel string searches in memory. You can find my documents on production-system radix sort (NOT just for academics!) and some neat little tricks for developers, on my blog My e-mail sig (since 1976): Engineers think equations approximate reality. Physicists think reality approximates the equations. Mathematicians never make the connection.
This entry was posted in algorithm, bit, bit shift, SSE2 and tagged , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s