Quick update on ACISM

Thanks to Nicholas Sinnott-Armstrong, who suggested that the fast and compact Aho-Corasick interleaved state machine, on github could use an upgrade from 32-bit to 64-bit, to handle (say) over 40M strings — a bit beyond 32-bit ACISM’s 16MB limit. He tried doing that, and with a small performance tweak, I’ve updated github. The state machine is twice as large, of course. On a 64-bit processor, it should be a couple cycles faster, though, because the shifts and masks are now constants. Anybody else using ACISM for pattern sets that large ??

Advertisements

About mischasan

I've had the privilege to work in a field where abstract thinking has concrete value. That applies at the macro level --- optimizing actions on terabyte database --- or the micro level --- fast parallel string searches in memory. You can find my documents on production-system radix sort (NOT just for academics!) and some neat little tricks for developers, on my blog https://mischasan.wordpress.com My e-mail sig (since 1976): Engineers think equations approximate reality. Physicists think reality approximates the equations. Mathematicians never make the connection.
This entry was posted in Uncategorized. Bookmark the permalink.

3 Responses to Quick update on ACISM

  1. Remusao says:

    Hi there,

    First of all thank you for ACISM, that I’m starting to use right now. The upgrade to 64 bits is interesting to me as I’m using it to match more strings than the 32 bits version could handle (I’ve tried a dictionary of around 20 Mo at the moment, and plan to increase the load soon). I didn’t make any bench or comparison between the two versions though, but I’ve found the 64 bits version very very fast for my use case.

    Thanks again,
    Rémi

  2. Serge says:

    Hello!
    Thank you for sharing the excellent and useful code of ACISM!
    You’ve written:
    > Searches byte vectors, not null-terminated strings. Suitable for searching machine code as much as searching text.
    Have I got it right, that zero bytes in the patterns are supported (UTF-8 strings, for example)?
    Thank you,
    Serge.

    • mischasan says:

      Yes, it pays no particular attention to zero bytes (or any other particular byte value).
      { ‘A’, ‘B’, 0xFF, ‘\0’ } and { ‘D’, ‘\0’, ‘\0’, ‘D’ } are just two 4-byte blocks, as search patterns or as targets for seach.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s