[erlang-questions] random lookup in ets

Thu Aug 26 18:57:04 CEST 2010

Hello

It is a
funny usage of the previous function in ordered set! 

I would
have bet that it produces an exception in case of

non existing
key (like it does with simple set).

I tried it
in a shell, and it works (almost as in your example).

##############################
1> T = ets:new(table,[ordered_set,public,named_table]).

table

2> ets:tab2list(table).

[]

3> ets:insert(table,[{1,"one"},{10,"ten"},{5,"five"},{50,"fifty"}]).

true

4> ets:tab2list(table).                                             

[{1,"one"},{5,"five"},{10,"ten"},{50,"fifty"}]

5> ets:prev(table,10).

5

6> ets:prev(table,9).

5

7> ets:prev(table,1).

'$end_of_table'

8> ets:prev(table,1000000).

50

9> T1 = ets:new(table1,[set,public,named_table]).

table1

10> ets:insert(table1,[{1,"one"},{10,"ten"},{5,"five"},{50,"fifty"}]).

true

11> ets:tab2list(table1).                                             

[{1,"one"},{10,"ten"},{50,"fifty"},{5,"five"}]

12> ets:prev(table1,10).

1

13> ets:prev(table1,1).

'$end_of_table'

14> ets:prev(table1,9).

** exception
error: bad argument

     in function  ets:prev/2

        called as ets:prev(table1,9)

15> 

##############################

This method
is quite easy to implement and should be fast, 

but it does
not guarantee that all keys have the same probability to be selected, 

because the
interval between keys is not a constant.

In my
example, if you select a random value between 1 and 60 (included) you will get:

'$end_of_table'
-> 1.67%

1               -> 6.67%     

5               -> 8.33%

10              -> 66.67%

50              -> 16.67%

§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§ 

I have
though about the solution with a complementary index table. I could be
interesting,

if the
initial table is formed by {PrimKey,Value} to build 2 tables:

primTable
contents {PrimKey,Index,Value},

index
contents {Index,PrimKey} (and is an ordered_set).

then: 

add(NewPrimKey,NewValue) ->
                             NewIndex = ets:info(index,size) + 1,                             ets:insert(index,{NewIndex,NewPrimKey}),                             ets:insert(primTable,{NewPrimKey,NewIndex,NewValue}).

delete(PrimKey) ->  
                             [{PrimKey,Index,Value}] = ets:lookup(primTable,PrimKey),                             % warning iterate in the list if primTable is a bag

                             delete(primTable,Primkey),

                             LastIndex = ets:info(index,size),

                             case (LastIndex == Index) of

                                 false
-> [{LastIndex,LastKey}] = ets:lookup(index,LastIndex),

                                             ets:insert(index,{Index,LastKey}),

                                             [{LastKey,LastIndex,Value}] = ets:lookup(primTable,LastKey),

                                             ets:insert(primTable,{LastKey,Index,Value});

                                 true
-> ok

                             end,

                             delete(index,LastIndex). 

The order
of elements in the tables changes, but it should have no impact on
probabilities.

In case of
parallel accesses to the table, the function must be serialized, and (may be) use

ets:safe_fixtable (and the code should be debbugged, I didn't try it... :o)

 §§§§§§§§§§§§§§§§§§§§§§§§§§§§§§

The simplest
solution 

>>   "second, you can use ets:tab2list(Ets),
and ets:info(Ets,size), generate 

>>   X with random:uniform(Size): a random value
between 1 and Size, and 

>>   finally take the Xth element of the list
with lists:nth(X).."

can be
accelerated, even if it will still be slow with big tables:

Size = ets:info(Ets,size),

X = random:uniform(Size),

ets:lookup(get(Ets,X)).

get(Ets,N)
-> get(Ets,N-1,ets:first(Ets)).

get(_,0,K)
-> K;

get(Ets,N,K)
-> get(Ets,N-1,ets:next(Ets,K)).

CU.