[erlang-questions] how: prefix matching on external term format

Paul Mineiro paul-trapexit@REDACTED
Mon Jun 2 19:17:27 CEST 2008


i'm now ready to attack my main use case for a disk-based ordered_set
erlang term store, namely, prefix matching on tuple/list keys.  i want to
be able to look up a match head such as { Bound, '_' } or [ Bound | '_' ]
very quickly.  (this is what we do all day at my job and we are currently
stuck with ets and we'd rather store some of this data on disk).

my tree consists of erlang terms in external format and sorted according
to erlang term order (via erl_compare_ext). unfortunately, tokyocabinet
does not export a search interface[1], only a prefix-matching operation
and a range operation.

so i've envisioned the following choices:

0. constructing the bytes corresponding to the prefix of the external
encoding up to the first variable encountered doesn't work; the prefix of
an externally formatted term is not a valid encoding (ever?), and
erl_compare_ext can potentially march along doing uninitialized reads
(erl_compare_ext doesn't take a length parameter for the buffers, it's
very trusting of the input).

1. there is no smallest or largest erlang term, unfortunately.  however i
can pick a "very small" and "very large" erlang term, replace variables in
the match head with such terms, query the tree using range, and then
search at both boundaries in case "very small" and "very large" are not
small or large enough (in the typical case, this will just involve looking
at one more record, so shouldn't be that bad).

2. i can fork erl_compare_ext and include a custom version in my code.  i
could take two unused tags and call them the smallest and largest erlang
term, then query the tree using range (or even prefix, since i can make my
custom erl_compare_ext safe to use on partial external term encodings
terminated by the special smallest or largest term tag; this would make
constructing the query cheaper).  of course this means the driver will
break everytime new type tags are added :( ... unless these special tags
are officially added to the external format to make these kind of
manipulations easier :)

i'm interested in better ideas.  for instance, my cursory inspection of
the ets implementation suggests using a search interface coupled with
C implementation of match specs to direct the search.  if this is the way
to go, i could contact the tokyocabinet maintainer and ask for a
search interface.

-- p

[1] by search interface, i mean, a way to search the tree according to a
user-specified function (which of course has to respect the existing
ordering of keys in the tree, in this case erlang term order).

In an artificial world, only extremists live naturally.

        -- Paul Graham



More information about the erlang-questions mailing list