[erlang-questions] Better way to check if a set of keys exists in a mnesia table?

Chaitanya Chalasani cchalasani@REDACTED
Sun Jul 17 06:47:02 CEST 2016


On 16-Jul-2016, at 19:56, Mikael Pettersson <mikpelinux@REDACTED> wrote:
> all_keys can be horribly expensive and should be avoided if possible, but for small tables it may be acceptable.
> 
> I'd do one of the following:
> 
> 1. mnesia:dirty_read(T, K) and check result for [] vs [_|_]
>   Pro: easy, works
>   Con: the data copy may be expensive for large records

Yes Indeed.

> 
> 2. Make the table an ordered_set; mnesia:dirty_prev(T, mnesia:dirty_next(T, K)) and check if K is returned
>   Pro: avoids the data copy
>   Con: requires an ordered_set, requires code to handle boundary conditions wrt '$end_of_table’

Using UUID as primary key, the ordered_set might eventually slow down my writes.

> 
> 3. Store the keys w/o data in a separate table, then do a dirty_read in that
>   Pro: reduces copying 
>   Con: requires more storage, the lookup in the side table won't provide cache hints to help your access
>        in the main table (but that may be Ok if the side table is hit orders of magnitude more often)
> 
> One could implement some sort of sparse bitmap or range tree and use that to record key presence, but I'm
> not sure it would be worthwhile in Erlang.

Yes, I am looking into this possibility as Eric also has suggested the same approach. I can think of using bitmap if it doesn’t complicate the solution beyond the performance again.

Also, when I was going through my use case, I figured out the chance of a table being remote is rare enough to make peace with ets:member and tried to implement as shown below - 

is_key(Tname, Key) ->
  case catch ets:member(Tname, Key) of
    {'EXIT', _Reason} ->
      is_remote_key(Tname, Key);
    Boolean -> Boolean
  end.

is_remote_key(Tname, Key) ->
  case mnesia:dirty_read(Tname, Key) of
    [] -> false;
    _ -> true
  end.

are_all_keys(Tname, Keys) ->
  Fun = case mnesia:table_info(Tname, storage_type) of
          unknown -> fun is_remote_key/2;
          _ -> fun is_key/2
        end,
  are_all_keys(Tname, Keys, Fun).

are_all_keys(_Tname, [], _Fun) -> true;
are_all_keys(Tname, [Key|Keys], Fun) ->
  case Fun(Tname, Key) of
    false -> false;
    true -> are_all_keys(Tname, Keys, Fun)
  end.

Below are the latencies when checked with timer:tc. 

*** Table has a local copy ***
13> timer:tc(mnesiaKeys, are_all_keys, [test, [1,2,3]]).
{11,true}
14> timer:tc(mnesiaKeys, are_all_keys, [test, [1,2,3,2,1,1,2,3]]).
{14,true}
15> timer:tc(mnesiaKeys, are_all_keys, [test, [1,2,3,2,3,1,2,3,1]]).
{14,true}
16> timer:tc(mnesiaKeys, are_all_keys, [test, [1,2,3,2,3,1,2,3,1]]).
{13,true}
17> timer:tc(mnesiaKeys, are_all_keys, [test, [1,2,3,2,3,1,2,3,1]]).
{13,true}
18> timer:tc(mnesiaKeys, are_all_keys, [test, [1,2,3,2,3,1,2,3,1]]).
{14,true}

*** Table is remote ***
9> timer:tc(mnesiaKeys, are_all_keys, [test, [1,2,3]]).
{975,true}
10> timer:tc(mnesiaKeys, are_all_keys, [test, [1,2,3,2,3,1,2,3,1]]).
{2151,true}
11> timer:tc(mnesiaKeys, are_all_keys, [test, [1,2,3,2,3,1,2,3,1]]).
{2003,true}
12> timer:tc(mnesiaKeys, are_all_keys, [test, [1,2,3,2,3,1,2,3,1]]).
{1898,true}
13> timer:tc(mnesiaKeys, are_all_keys, [test, [1,2,3,2,3,1,2,3,1]]).
{2027,true}

Though I didn’t use UUIDs in my example I think this is optimized enough. Please suggest otherwise. 


/Chaitanya




More information about the erlang-questions mailing list