[erlang-questions] Trie fun

Mon Jun 27 20:38:37 CEST 2011

Erlang's MR against a trie seems ideal for parallelizing a word filter.
However, the filter-terms becomes the bottleneck.

In many languages we'd put it into memory and acquire a
pointer/reference to it, spawn the worker threads and pass them the
address of the filter list.
This is efficient as it's static-data and only in memory in one place.

In Erlang we build up the trie and then send it messages, leading to a
bottleneck at the filter-list's message queue.
Okay, so we can spawn multiple filters, and ship words to them, but
every filter has to build up the filter-list from storage (compiled
in, in this case, to minimize overhead.)
Now we have 'm' words going to 'n' filters.  Except for the startup
costs, that doesn't seem too bad.

The Trie is static, prebuild it and store it as a binary "variable",
so it's just blasted into memory instead of actually being built?  Any
downsides? (Theory; I don't know if you
can "image" memory this way in Erlang yet.)

Is there a better way?  Yes, I could do it in C and link it in but I'm
interested in the Erlang way.  :)

Thanks!

-mox