Bug fix for ACISM

Several people have used the Aho-Corasick interleaved state-transition matrix code, only to find it didn’t match all strings (!)
The bug was in the machine construction: the “prune backlinks” step in acism_create is (as-is) broken. I commented it out, and voilà it works.
I will update github once I’ve sorted out an SSL cert issue 😛
Thanks for the patience of those who took that code and stumbled on that bug; and special thanks to perfgao@gmail.com, who came up with the tiny example that identified the problem.

FWIW the prune-backlinks code will return; but it will check that the subtrees of targets of backlinks, not just the immediate children, must be a subset of the subtree of the source, in order to prune the link.

Posted in aho-corasick, algorithm, string search | Leave a comment

GMAKE cheap trick #4 for non-recursive make

One of the tedious sides of non-recursive make is that all makefiles share the same namespace. Every variable or pseudo-target name must be unique across all subprojects, or must be a well-known global variable/target, to which a subproject’s bits and pieces are appended. Appending to global variables doesn’t alway have the desired effect, either.

For example, these kinds of rules/assignments work (assuming EXPORT_PGMS is used and owned by global_rules.mk):

include global_rules.mk
EXPORT_PGMS += rsaencrypt
RSAENCRYPT_testfiles = foo.dat bar.dat

all : rsaencrypt
clean .PHONY: rsaencrypt.clean
rsaencrypt.clean :; rm -f ${RSAENCRYPT_testfiles }

But these do not:

CFLAGS = -g # problem: previous CFLAGS wiped out
CFLAGS += -Imock # oops: now ALL subprojects see this flag
clean:; rm -f test.out # fails; "clean:" actions can only be defined once
clean::; rm -f test.out # fails if "clean:" is defined anywhere.

A second tedious side is that the current working directory will not be the one where the Makefile and source files live. That means that every file reference must have a directory path. The obvious change makes the makefile harder to read, and hence more prone to mistakes …

rsaencrypt.source = ${rsaencrypt}/alpha.c ${rsaencrypt}/beta.c ${rsaencrypt}/gamma.c \
                    ${rsaencrypt}/delta.c ${rsaencrypt}/beta.c ${rsaencrypt}/epsilon.c \

One approach is:

rsaencrypt.source = alpha.c beta.c gamma.c delta.c epsilons.c digamma.c
rsaencrypt_source.zip : ${rsaencrypt.source:%=${mydir}/%}
# ... or without the extra variable:
rsaencrypt_source.zip : $(addprefix ${rsaencrypt}/, alpha.c beta.c gamma.c delta.c epsilons.c digamma.c)

However, there’s a little abbreviation trick that can help. All of my makefiles begin like this:

~ := rsaencrypt
. := $(word 1,${$_} .)

There’s nothing special about the choice of characters; any of $! $+ $- $, $` $_ will also work. The object is to have an unobtrusive single-character abbreviation, avoiding letters, digits and gmake’s own $@ $< $^ $* : = # .

The $(word...) expression sets $. to . if no parent makefile has set ${rsaencrypt}. Long variable names that are just following cross-project conventions can be prefixed with “$~…”:

$~source = $./alpha.c $./beta.c $./gamma.c $./delta.c $./epsilons.c $./digamma.c
$~source.zip : ${$~source}

Note the :=, not =. That means that, as sub-makefiles are included, the definitions of $~ and $. change from that point on. Which also means that include ${subdir}/submakefile lines must come at the bottom of the makefile, and successive include directives can’t use $. for the pathname. Well, nothing’s perfec. 🙂

Posted in make, non-recursive make | Leave a comment

GMAKE cheap trick #3

What’s the shortest way to define a GMAKE var that is exactly one space?

SPACE = $~ #

Posted in make, Uncategorized | Leave a comment

Quick update on ACISM

Thanks to Nicholas Sinnott-Armstrong, who suggested that the fast and compact Aho-Corasick interleaved state machine, on github could use an upgrade from 32-bit to 64-bit, to handle (say) over 40M strings — a bit beyond 32-bit ACISM’s 16MB limit. He tried doing that, and with a small performance tweak, I’ve updated github. The state machine is twice as large, of course. On a 64-bit processor, it should be a couple cycles faster, though, because the shifts and masks are now constants. Anybody else using ACISM for pattern sets that large ??

Posted in Uncategorized | 3 Comments

21st Century Programming Languages

When I was a co-op student, I programmed in a language that had lambdas, currying, user-defined operators (including assignment), dynamic compilation, and a wealth of other features. It was neat, efficient and readable. That languages was POP-2.

It was written in 1970.

Posted in Uncategorized | Tagged , | Leave a comment

“Unusual uses of SSE2” posted to github

In this month’s frenzy of putting source code out there in a usable form, I’ve posted source to github for the SSE2 implementations of string search, BNDM search, sorting [16] doubles, and bit-matrix transpose; plus some convenience tools for SSE2.

Posted in bit, bit matrix transpose, bit shift, ffs, SSE2, string search, Uncategorized | 2 Comments

A tiny asynchronous DNS client posted to github

There’s a world out there beyond the “bind” daemon … and this DNS client library can access it and show you how it works. This is my reference model for writing a tiny independent library plugin to another event-loop mainline, without callbacks. and without compromises. It also has a few good tricks for using a hash table as a cache — deletion by TTL is incremental. https://github.com/mischasan/madns

Posted in Uncategorized | Tagged | Leave a comment