A tiny asynchronous DNS client posted to github

There’s a world out there beyond the “bind” daemon … and this DNS client library can access it and show you how it works. This is my reference model for writing a tiny independent library plugin to another event-loop mainline, without callbacks. and without compromises. It also has a few good tricks for using a hash table as a cache — deletion by TTL is incremental. https://github.com/mischasan/madns

Posted in Uncategorized | Tagged | Leave a comment

HXFILE source posted to github

In case you’ve read my paper on highly-concurrent linear hash files, and want to see/use the source, I’ve posted it to https://github.com/mischasan/hx. It’s been a lot of fun, and I’ve used it in an OLAP product and some performant key-value lookup apps. As-is, it is limited to 128TB per file and 32K per record.

Posted in Uncategorized | 2 Comments

Aho-Corasick code posted to github

You may be interested in the source code for my best-of-both-worlds Aho-Corasick implementation. It runs on a state-transition matrix (the fastest implementation) in an interleaved array (the smallest-data implementation). Typically 2.5 bytes of state machine for each byte of pattern text. The matrix is tuned for L1-cache-friendliness. The original google doc describing the technique is http://goo.gl/6decc. https://github.com/mischasan/aho-corasick

So what’s the big deal about this? Well, back in the late 70’s, Robert Tarjan tried this … but found that the “compile” step was ridiculously expensive, and unsuitable for large sets of patterns. He based his work on another researcher’s ideas about “interleaved arrays” which is a very cool idea (“ACISM” in the github source refers to “Aho-Corasick Interleaved State-transition Matrix”). Unfortunately, the idea of how best to do this was by a physical analogy: if you want to fill a bucket with rocks, gravel and sand, put the bigger (coarser) things in first, because the finer-grain things will fit in the gaps around them. Unfortunately, that produces the worst-case runtime behaviour.

Instead, I tried inserting the objects in random order, while using a “hint” array to skip searching in areas where a given object couldn’t possibly fit. Tarjan was right about interleaved arrays, just missed a trick in implementation. See the .pdf at github, or the google.doc article, to see how the compilation works. Or just have fun using the code.

Posted in Uncategorized | 4 Comments

SSE2 bit matrix transpose special case … 8 x 256 … for Marek

It turns out that Marek’s Idea of the Day: Bitsliced SipHash used my SSE2 bit-matrix transpose routine, but it wasn’t fast enough. This is normally the case for SSE2: the more specific the problem, the better the code can be. I have no way of contacting Marek, but, if you see this link, here’s the optimal code for transposing an 8 x 256 bit matrix. It cuts the number of operations in half. The trick is, knowing that there are only 8 rows, the XMM operations can transpose two sets of 8 bytes simultaneously. The change is worth adding as a special case to the full transpose routine, but for simplicity’s sake, here’s the part Marek could use:

void bitmat_trans_8x256(char *inp, char *out)
    uint16_t h, *hinp = (uint16_t*)inp;
    union { __m128i x; uint8_t b[16]; } tmp;
    int r, c;

    for (c = 0; c < 16; ++c) {
        for (r = 0; r < 7; ++r) {
            tmp.b[r] = h = hinp[r * 16 + c];
            tmp.b[r + 8] = h >> 8;

        for (r = 0; r < 7; ++r) {
            out[c * 16 + r] = h = _mm_movemask_epi8(tmp.x);
            out[c * 16 + r + 8] = h >> 8;
            tmp.x = _mm_slli_epi64(tmp.x, 1);

Compiled on a Core7 using gcc 4.4.6, it amounts to a 70-instruction loop repeated 16 times.

Posted in algorithm, bit, bit shift, SSE2 | Tagged , , | Leave a comment

Non-recursive Make part 3 – a tool for the fearless

In part 1, I mentioned that it is typically painful to attack a large project that uses recursive make. Nonetheless, if you’re willing to make the effort, here’s a tool to help you pick apart what’s really going on in a project tree of recursive makes.

Have you ever looked at the output of “make -nps”? It shows exactly what variables, rules and actions are relevant to the make target(s) you specify. On the down side, it includes masses of info that is hard for you to parse. On the up side, you don’t have to. The following Perl script analyzes “make -nps …” output, and builds a bog-simple makefile, with relevant variable settings, conditional settings, rules and actions. After that, it’s up to you to follow the rules of NR make: all variables, targets and pathnames must begin with an identifier only used by that project.

#!/usr/bin/perl -w
NOTE: "rules" output will include some partial extraneous rules
        unrelated to the goal (explicit or default).
use strict;
use Cwd; my $cwd = getcwd;
my %ignore = map {$_=>1}

my (%set, %use, %deps, %acts, $skip, $rules, $files, $targ);
open my $ph, '-|', "gmake -nps @ARGV" or die "Cannot run make";
while () {
    $rules = 0 if m{^# Variables};
    $rules = 1 if m{^# Implicit};
    $skip  = 1 if $rules and m{^# Not a target:|^[^#:=\t]*[%(][^=]*:};
    $skip  = 0 if m{^$};
    next if $skip;
    if (my ($var,$imm,$val) = m{^(\S+)\s*(:)?(= ..*)} and !$ignore{$1}) {
        $val =~ s{#}{\\#}g;
        $val =~ s{(\s+)$}{$1#};
        $use{$_} = 1 for $val =~ m{\$[({]([-.\w]+)(?::[^)]*)[})]}g;
        $use{$var} = 1 if $imm ||= '';
        $set{$var} = "$imm$val";
    } elsif (m{^\t.} and $rules) {
        $acts{$targ} .= $_;
        $use{$_} = 1 for m{\$\(([-.\w]+)\)}g;
        $use{$_} = 1 for m{\$\{([-.\w]+)\}}g;
        $use{$targ} = 1 if m{\$\(\$@\)} or m{\$\{\$@\}}
    } elsif (m{^# ((\w+) ([:+])?= .*)}) {
        $deps{$targ} = "$targ: $1\n".($deps{$targ} || '') if $targ;
        $use{$2} = 1 if $3;
    } elsif ($rules and m{^(\S+):\s*(.*)}) {
        $targ = $1;
        $deps{$1} = "$1: $2" if $2;
    } elsif (m{^\s*$}) {
        $targ = '';

# Include var values have nested variable expansions
# Recursively determine vars used.
my @curr = keys %use;
while (@curr) {
    my $vals = join ' ', map {$set{$_} || ''} @curr;
    @curr = grep {!$use{$_}++}
                $vals =~ m{\$[({]([-.\w]+)(?::[^)]*)[})]}g;

my @vars = sort grep {$set{$_} ||= $ENV{$_}} keys %use;
my $wid = 0;
for (@vars) { $wid = length if $wid < length; }
printf "%-${wid}s %s\n", $_, $set{$_} for @vars;
$acts{$_} ||= '' for keys %deps;
$deps{$_} ||= "$_:" for keys %acts;
print "#----";
s{$cwd/}{}mg for values %deps;
s{//+}{/}mg  for values %deps;
print "\n$deps{$_}\n$acts{$_}"
    for sort grep {$acts{$_} or $deps{$_}} keys %deps;
print "# vim: set nowrap:\n";

Here’s what “rules” can do for the subproject GNUmakefile in Non-recursive make (gmake) part 2:

-include ../rules.mk
export madns    ?= .
# madns imports libtap from util:
util            ?= ../util
PATH            := $(madns):$(PATH)
#---------------- PRIVATE vars:
madns.test      = $(madns)/madns_t
#---------------- PUBLIC TARGETS (see rules.mk):
all             : madns.all
test            : madns.test
install         : madns.install
#---------------- inputs to "install":
madns.bin       = $(madns)/hostip
madns.lib       = $(madns)/madns.a
#---------------- inputs to "clean":
all             += madns
#---------------- PRIVATE TARGETS:
madns.all       : $(madns.bin) $(madns.lib) $(madns.test)
madns.test      : $(madns.test:%=%.pass)

$(madns.bin)    : $(madns.lib)
$(madns.lib)    : $(madns)/madns.o

$(madns.test)   : LDLIBS += -pthread
$(madns.test)   : CFLAGS += -I$(util)
$(madns.test)   : $(madns.lib) $(util)/libtap.a

.INTERMEDIATE   : $(madns)/madns.o $(madns)/madns_t.o

-include $(madns)/*.d

… and the “rules” output, with environment variable $BLD set to debug is:

BLD           = debug
CC            = cc
CFLAGS        = -g -MMD -fdiagnostics-show-option -fstack-protector --param ssp-buffer-size=4 -fno-strict-alias
CPPFLAGS      = -mtune=native -I$(PREFIX)/include -D_FORTIFY_SOURCE=2 -D_GNU_SOURCE $(CPPFLAGS.$(BLD))
LINK.o        = $(CC) $(LDFLAGS) $(TARGET_ARCH)
OSNAME        := Linux
PATH          := .:/home/mischas
PREFIX        = /usr/local
all           = madns
        [ "$^" ] && ar crs $@ $(filter %.o,$^)

.INTERMEDIATE: madns.o madns_t.o

.PHONY: all clean cover debug gccdefs install profile source tags test

all: madns.all
        @echo "$@ done for BLD='$(BLD)'"

        @rm -rf $(shell $(MAKE) -nps all test cover profile | sed -n '/^# I/,$${/^[^\#\[%.][^ %]*: /s/:.*//p;}') \
        $(clean)  $(foreach A,$(all), $($A)/{gmon.out,tags,*.fail,*.gcda,*.gcno,*.gcov,*.prof}) \
        $(filter %.d,$(MAKEFILE_LIST))

        @$(CC) $(CPPFLAGS) -E -dM - </dev/null | cut -c8- | sort

hostip: hostip.o madns.a
        $(LINK.o) $^ $(LOADLIBES) $(LDLIBS) -o $@

hostip.o: hostip.c
        $(COMPILE.c) $(OUTPUT_OPTION) $<

install: madns.install

madns.a: madns.o
        [ "$^" ] && ar crs $@ $(filter %.o,$^)

madns.all: hostip madns.a madns_t

madns.o: madns.c
        $(COMPILE.c) $(OUTPUT_OPTION) $<

madns.test: madns_t.pass

madns_t: LDLIBS += -pthread
madns_t: CFLAGS += -I$(util)
madns_t: madns_t.o madns.a ../util/libtap.a
        $(LINK.o) $^ $(LOADLIBES) $(LDLIBS) -o $@

madns_t.o: madns_t.c
        $(COMPILE.c) $(OUTPUT_OPTION) $<

test: madns.test
# vim: set nowrap:

Almost all the above actions are GNUmakefile defaults. Default actions are not expanded unless specified; “rules install” will also show the “madns.install” target’s actions.

Posted in make, non-recursive make | Leave a comment

The “C” preprocessor: not as cryptic as you’d think

The C preprocessor is a modest macro-expansion language (check out “m4” if you want to see an immodest one).  Basic symbols and function-macros are convenient for giving meaningful names to constants and tiny function calls, with the rewarding feeling that you’re helping the compiler do its optimizing best.

As an aside, you can try out these samples using your normal compile commands but substituting cc -E … for cc -c … . I am going to assume you can find out what #X and A##B mean to the precompiler.

An unfortunate truth about cpp coding is, it can quickly make it impossible to figure out what actually is happening (qv my post on identifying what is being #included, and how to find out which #if’s control a given line’s inclusion.

Macros can make life hell for people who heavily rely on gdb. Like programs built with C++ templates, there’s often a big disjunct between the source code, and the symbols the debugger displays. Also, you can’t set a breakpoint on a macro. For that reason, static inline functions are preferrable, even to tiny macro functions.

Badly written macros can create odd syntax errors, or silently destroy control flow in your program
(Please pardon the squished-together code layout)

#define foo(str) { puts("error:"; puts(str) }
    if (a != b) foo(a); else puts("okay");

The if-statement would be fine if foo were a real function; but braces mess up the (admittedly peculiar) requirements of if-statement syntax. The common defense is to use a peculiar idiom:

#define foo(str) do { puts("error:"; puts(str) } while(0)

The precompiler is useful when nothing but constants are allowed, and the code becomes extremely repetitive.

For example, the SSE2 shift operators only take constant shift-distance arguments. This results in repetitive case statements, particularly if you are trying get the most out of the compiler (fewest calls and branches).

The practical example here is an optimal shift-left function for 128-bit XMM registers.
This takes a 127-way case statement of the form:

__m128i xm_shift_left(__m128i x, int bits)
    switch (bits) {
    case 0: break;
    case 1: ... break;
    case 2: ... break;
    case 127: ... break;
    default: x = _mm_setzero_si128(); 
    return x;

Now the kicker: optimal instruction sequences for those cases break down into four different formulas:

  • for n=8,16, …120, a single instruction:
    _mm_slli_si128(x, n/8)
  • for n=1..7, a five-instruction sequence that I may dissect in another post:
    _mm_or_si128(_mm_slli_epi64(x, n), 
                 _mm_srli_epi64(_mm_slli_si128(x, 8), 64-n))
  • for n=9..15 17..23 … 57..63, a six-instruction sequence:
    _mm_or_si128(_mm_slli_epi64(_mm_slli_si128(x, n/8), n%8), 
                 _mm_srli_epi64(_mm_slli_si128(x, 8+n/8), 64-n%8))
  • and for n=65..71 73..79 … 121..127, a two-instruction sequence:
    _mm_slli_epi64(_mm_slli_si128(x, n/8), n%8)

You can spend some time generating the case statement by hand; or you can use a bit of automation.

Originally, I #define’d four macros (F1(n), F2(n), F5(n), F6(n)) then used this perl command

perl -e 'print "case $_: x=F"
               .($_ < 8 ? 5 : $_ % 8 ? $_ < 64 ? 6 : 2 : 1)
               ."($_); break;\n" for 1..127

Using perl to generate embedded “C” code has its own problems; but you can do the equivalent with macros.

The trick is, if you have a fixed (small) number of repetitions (e.g for i = 1 to 7), you can expand them in line.
The above ranges can be broken into single loops of 1..7 and double loops of 1..7 x 1..7. First, an inner- and outer-loop macro:

// for (i = 1; i < 8; i++) f(x + i);
#define BY_1(f,x) f((x)+1) f((x)+2) f((x)+3) \
                  f((x)+4) f((x)+5) f((x)+6) f((x)+7)

// for (i = 8; i < 64; i += 8)  f(y + i);
#define BY_8(f,y) BY_1(f,(y)+1*8) BY_1(f,(y)+2*8) BY_1(f,(y)+3*8) \
                  BY_1(f,(y)+4*8) BY_1(f,(y)+5*8) BY_1(f,(y)+6*8) \

Then the four functions, written as case statements:

#define F1(n) case (n)*8: x = _mm_slli_si128(x, n); break;
#define F2(n) case n: x = _mm_slli_epi64(_mm_slli_si128(x, n/8), \
                                         n%8); break;

And now the loops:

    BY_1(F5, 0) // 1..7
    BY_1(F1, 0) // 8,16, ... 56
    BY_1(F1, 7) // 64,72, ... 120
    BY_8(F6, 0) // 9..15, 17..23, ... 57..63
    BY_8(F2,56) // 65..71, 73..79, ... 121..127

Try this cc -E, and you will see output that — with a bit of whitespace fixups — looks like:

    case (0)+7: x = _mm_or_si128(_mm_slli_epi64(x, (0)+7), 
                                  _mm_srli_epi64(_mm_slli_si128(x, 8),
    case (0)+1*8: x = _mm_slli_si128(x, (0)+1); break;
    case ((56)+1*8)+1: x = _mm_slli_epi64(_mm_slli_si128(x, ((56)+1*8)+1/8),

Other kinds of lists

Another use of more-than-vanilla macros is to put definitions of a list of values in a single place.

For example, I have an enum for about 100 error return values: HXERR_BAD_REQUEST, HXERR_BAD_RECORD, HXERR_BAD_FILE … Among other things, I want string versions of those enum values, for logging. I could define the two lists separately, and risk a hard-to-spot mismatch. Or, I could do this:

               _E(BAD_FILE) _S _E(READ) ...
#define _S ,
#define _E(x) HXERR_##x = _HXERR_##x
enum { _HXERR_OK=0, HXERRS };

I can tweak that so the error enums are negative:

#undef _E
#define _E(x) HXERR_##x = -_HXERR_##x
enum { HXERR_OK=0, HXERRS };

What’s happening here? _S is a macro for the “separator”. _E(x) is a mnemonic for “each” or “element”. Setting _S and _E makes the HXERRS macro expand in different ways.

That defined the enum. Now I want the corresponding list of printable strings, probably in some private source file:

#define _S ,
#define _E(x) #x
    static char const *symv[] = { "okay", HXERRS };

Yes, this is NOT as obvious as hard-coding the enum and string list. But it’s a good solution when the problem is a list duplicated several times and/or in different files.

That’s it for now. Next up: convergence between preprocessor programming and function programming.

Posted in bit shift, preprocessor | Tagged , , , , , , | 6 Comments

Non-recursive make (gmake) part 2: rules.mk

I love the power of GMAKE’s implicit rules and actions. The fewer magical relationships, the better. Ask me how I feel about dev/runtime frameworks that tie your elbows together as you try to write/extend code.

I’m going to assume you are familiar with most GMAKE constructs. For those who aren’t, here’s is my favourite gmake quick-ref.

So rules.mk is all about extending the existing rules, and sticking to pattern rules and variables that GMAKE default rules/actions use. Here’s rules.mk in sections:

ifndef RULES_MK
RULES_MK:=# Prevent repeated "-include".
...all the rest of the file...

The guard variable RULES_MK, as in #include headers, allows rules.mk to be repeatedly included in project and sub-project directories. It’s pretty much the only use I’ve had for GMAKE’s source-level conditionals.

# Import from PREFIX, export to DESTDIR.
PREFIX  ?= /usr/local
OSNAME  := $(shell uname -s)
# Before gcc 4.5, -Wno-unused-result was unknown and causes an error.
Wno-unused-result := $(shell gcc -dumpversion | awk '$$0 >= 4.5 {print "-Wno-unused-result"}')

I use “:=” with $(shell …), so these are only evaluated once. Again if anyone can figure out a more elegant way to solve the above, I’m all ears.

#---------------- Explicitly CANCEL EVIL BUILTIN RULES:
%   : %.c
%.c : %.l
%.c : %.y
%.r : %.l

The comment says it all. They will bite you when you least expect it.

CFLAGS  += -g -MMD -Wall -Werror -Wextra $(Wno-unused-result) $(CFLAGS.$(BLD))
CPPFLAGS += -mtune=native -I$(PREFIX)/include -D_GNU_SOURCE $(CPPFLAGS.$(BLD))

Here’s where all the $(BLD)- and $(OSNAME)- dependent flags are added in. I try to be rigorous about separating C preprocessor flags from C flags; and much to my surprise, some “-m” flags affect the preprocessor.

The CFLAGS element “-MMD” makes GCC generate the make-compatible dependency files, one per source file, that are included with the “-include $(madns)/*.d” lines in sub-project makefiles.

.DEFAULT_GOAL   := all
all   :;@echo "$@ done for BLD='$(BLD)'"

Each sub-project “foo” defines a dependency “all : foo.all”, then makes foo.all depend on all its local targets.
$(BLD) is the build-type variable: debug cover profile or default (release).

clean :;@rm -rf $(shell $(MAKE) -nps all test cover profile \
                       | sed -n '/^# I/,$${/^[^\#\[%.][^ %]*: /s/:.*//p;}') \
                 $(filter %.d,$(MAKEFILE_LIST))

You may recognize this as one of my GMAKE tricks posts. A bit cryptic, but using make’s own knowledge of intermediate targets means you never miss anything.

Within a sub-makefile it is possible to have specific clean-up targets; e.g.

clean : madns.clean
madns.clean :;@rm -rf $(madns.tmpdirs)

Remember the “madns.bin” and “madns.include” variables in part 1? I referred to them as standard inputs for the “install” target. Here’s one way “make install” can be implemented:

Install   = [ -z "$($1.$2)" ] || cp -p $($1.$2) $(DESTDIR)/$2/
%.install : $(patsubst %, $(DESTDIR)/%/.., bin include lib) $(%.bin) $(%.include) $(%.lib) \
          ; $(call Install,$*,bin); $(call Install,$*,include); $(call Install,$*,lib)

Install is a GMAKE macro
Now come the basic pattern rules:

%.test : $(%.test)
%.pass : % ; $(@:.pass=) >$(@:.pass=.fail) 2>&1 && mv -f $(@:.pass=.fail) $@

This is useful for running many tests in parallel without confusing their output.
$(%.test) is a list of “.pass” filenames, such as “$(madns)/madns_t.pass”.
The above will run:

$ madns/madns_t >& madns/madns_t.fail && mv -f madns/madns_t.fail madns/madns_t.pass

In other words, if the test fails, there will be a *.fail file left behind, containing (only) this test’s failure output. If not, the *.pass file will mark that the test passed, and (until the next “make clean”) will not re-run the test.

%.so : CFLAGS := -fPIC $(filter-out $(CFLAGS.cover) $(CFLAGS.profile), $(CFLAGS))

This is GMAKE’s target-specific variable setting. It applies to any %.so target, and any dependency of that target. It ensures that, if code is compiled to be in a .so, it is compiled with -fPIC and without any coverage or profiling flags … which usually make the .so unloadable due to static dependencies.

%.so : %.a ; $(CC) $(CFLAGS)  -o $@ -shared -Wl,-whole-archive $< $(LDLIBS)
%.a  :          ; [ "$^" ] && ar crs $@ $(filter %.o,$^)

Here’s an example of writing explicit commands that stick to using ONLY the variables that GMAKE default rules use.

The %.a rule has that odd conditional ‘[ “$^” ]’. Why? Note back in part 1, that “madns_t” depends on “$(util)/libtap.a”. That sub-project makefile has no way to create that file if it does not exist; but it is convenient to list that file in madns_t’s dependencies. The ‘[ “$^” ]’ tests whether any dependencies have been defined, from which to build $(util)/libtap.a. If they haven’t then MAKE stops on an error.

%.yy.c  : %.l       ; flex -o $@ $<
%.tab.c : %.y       ; bison $<
%/..    :           ;@mkdir -p $(@D)

Here are examples of the correct rules for building from [f]lex and yacc/bison source files.
And the %/.. rule is a cheap way to ensure that a required directory (dependency) is created.

That’s about it. There are minor bells and whistles that make “clean” more thorough; but I will be posting the complete project on github.com shortly.

Posted in make, non-recursive make | Tagged | 3 Comments