[erlang-questions] random lookup in ets

Robert Virding rvirding@REDACTED
Tue Aug 24 15:21:17 CEST 2010


On 23 August 2010 22:30, Anthony Molinaro <anthonym@REDACTED> wrote:
> On Mon, Aug 23, 2010 at 07:05:40AM +0200, Pascal Chapier wrote:
>> first, if your application manage the ets, you can add an new table as
>> index, with an integer list as key, and the main ets key as value. then
>> you will have to select a random integer between min and max index key,
>> retreive the main ets key and then the value. The problem with this is
>> that you will have to keep the index keys continous, it is not so easy
>> and may takes some time when you make a delete operation. On the other
>> hand it should be fast for random read.
>
> If you are keeping a secondary index you could do what Paul Mineiro
> suggested I do with a similar problem.  In my problem I have non-overlapping
> ranges of 32-bit integers.  For each range I have data associated with
> the range.  Given an integer I wanted to look up the value associated
> with the range.  I use an ordered_set ets table which contains tuples
> of the form
>
> { StartOfRange, SecondaryIndexKey }
>
> Then given an integer I do
>
> { StartOfRange, SecondaryIndexKey } =
>  case ets:lookup (Index, IntIp) of
>    [] ->
>      case ets:lookup (Index, ets:prev (Index, IntIp)) of
>        [] -> {0, -1};
>        V -> hd (V)
>      end;
>    V -> hd (V)
>  end,
>
> To get back the SecondaryIndexKey (ets:prev will return the previous entry,
> so using it when I don't exact match will give me the entry responsible
> for the range).  This is extremely fast (I haven't tested it but something
> like
> [ lookup (random:uniform (trunc (math:pow (2,32))))
>  || V <- lists:seq (1, 10000) ]
>
> where lookup/1 actually looks up the SecondaryIndexKey, gets the ets value
> out of the secondary table (which is a large binary object, my
> SecondaryIndexKey are actually offsets into a file), turns the returned
> binary into a record where integers are turned into string via tuple
> lookup tables, and returned.  So doing lookup 10000 times with random
> input takes about ~950 milliseconds on my laptop, so very, very fast :)

This solutions assumes that ranges are adjacent to one another and
that all possible indexes are a member of a range. Was this specified
in the problem?

Robert


More information about the erlang-questions mailing list