“Unusual uses of SSE2” posted to github

In this month's frenzy of putting source code out there in a usable form, I've posted source to github for the SSE2 implementations of string search, BNDM search, sorting [16] doubles, and bit-matrix transpose; plus some convenience tools for SSE2.

SSE2 beats SSE4.2 in memcmp?

At the moment I haven't any box where I can test the latest GCC compilers and SSE4.2 support (pcmpestri etc). So far, the following beats gcc 4.4 with -march=corei7 -msse4.2 (okay, perhaps that's redundant :-). But gcc generates "repz cmpsb"

The Generic SSE2 Loop

In response to a couple of comments on my post about find-first-bit-set in SSE2 registers, amounting to "what use is a routine that only does 16-byte bitvecs", I thought I'd post the canonic, generic loop through memory using SSE2 ops.

