*For the full “C” code that uses this idea for an arbitrary-length byte vector, see*

this later blog post

In a discussion about all the wonderful uses of the combination movemask(pcmpxx(a,b)), it occurred to me that this gives you a fast XMM version of find-first-set and find-last-set (bit) operations. Pardon my C-centrism in preferring bit positions 0..127 (with -1 for no-bit-set) rather than the 1… convention of the x86 ops (bsfl,bsrl).

int xm_ffs(__m128i x) {
int pos = _mm_movemask_epi8(_mm_cmpeq_epi8(x, _mm_setzero_si128()));
pos = ffs((uint16_t)~pos) - 1;
return pos < 0 ? -1
: (pos << 3) + ffs(((unsigned char const*)&x)[pos]) - 1;
}

Note the folderol needed because x86 has **_mm_cmpeq_epi8, _mm_cmplt_epi8, _mm_cmpgt_epi8**, but no **_mm_cmpneq_epi8**. Go figure.

And if anyone comes up with a cleverer way to index bytes of an XMM value, I’d love to see it.

### Like this:

Like Loading...

*Related*

## About mischasan

I've had the privilege to work in a field where abstract thinking has concrete value. That applies at the macro level --- optimizing actions on terabyte database --- or the micro level --- fast parallel string searches in memory. You can find my documents on production-system radix sort (NOT just for academics!) and some neat little tricks for developers, on my blog https://mischasan.wordpress.com
My e-mail sig (since 1976):
Engineers think equations approximate reality.
Physicists think reality approximates the equations.
Mathematicians never make the connection.