[erlang-questions] random lookup in ets

Mon Aug 23 22:30:51 CEST 2010

On Mon, Aug 23, 2010 at 07:05:40AM +0200, Pascal Chapier wrote:
> first, if your application manage the ets, you can add an new table as 
> index, with an integer list as key, and the main ets key as value. then 
> you will have to select a random integer between min and max index key, 
> retreive the main ets key and then the value. The problem with this is 
> that you will have to keep the index keys continous, it is not so easy 
> and may takes some time when you make a delete operation. On the other 
> hand it should be fast for random read.

If you are keeping a secondary index you could do what Paul Mineiro
suggested I do with a similar problem.  In my problem I have non-overlapping
ranges of 32-bit integers.  For each range I have data associated with
the range.  Given an integer I wanted to look up the value associated
with the range.  I use an ordered_set ets table which contains tuples
of the form

{ StartOfRange, SecondaryIndexKey }

Then given an integer I do

{ StartOfRange, SecondaryIndexKey } =
  case ets:lookup (Index, IntIp) of
    [] ->
      case ets:lookup (Index, ets:prev (Index, IntIp)) of
        [] -> {0, -1};
        V -> hd (V)
      end;
    V -> hd (V)
  end,

To get back the SecondaryIndexKey (ets:prev will return the previous entry,
so using it when I don't exact match will give me the entry responsible
for the range).  This is extremely fast (I haven't tested it but something
like
[ lookup (random:uniform (trunc (math:pow (2,32))))
  || V <- lists:seq (1, 10000) ]

where lookup/1 actually looks up the SecondaryIndexKey, gets the ets value
out of the secondary table (which is a large binary object, my
SecondaryIndexKey are actually offsets into a file), turns the returned
binary into a record where integers are turned into string via tuple
lookup tables, and returned.  So doing lookup 10000 times with random
input takes about ~950 milliseconds on my laptop, so very, very fast :)

You could do the same thing to avoid keeping the keys continuous, as you
just want a random entry, so the previous entry is as good at the random
entry you went looking for.  Deletes are still hard because you have to
find the SecondaryIndexKey of the entry you are deleting from your primary
table, but maybe not too bad if you keep both a forward {Integer, Key} and
a backward {Key, Integer}.  Then when you delete the Key in your real table
you get the Key from the backward table, and then remove it from that table,
then remove from the forward table.

> second, you can use ets:tab2list(Ets), and ets:info(Ets,size), generate 
> X with random:uniform(Size): a random value between 1 and Size, and 
> finally take the Xth element of the list with lists:nth(X)..

This seems really slow with any sizable data, I wouldn't even attempt it
with the table I do the above on (that table has 10 million entries).

Hope that helps,

-Anthony

-- 
------------------------------------------------------------------------
Anthony Molinaro                           <anthonym@REDACTED>