[erlang-questions] Better way to check if a set of keys exists in a mnesia table?

Chaitanya Chalasani cchalasani@REDACTED
Sun Jul 17 06:47:02 CEST 2016

On 16-Jul-2016, at 19:56, Mikael Pettersson <mikpelinux@REDACTED> wrote:
> all_keys can be horribly expensive and should be avoided if possible, but for small tables it may be acceptable.
> I'd do one of the following:
> 1. mnesia:dirty_read(T, K) and check result for [] vs [_|_]
>   Pro: easy, works
>   Con: the data copy may be expensive for large records

Yes Indeed.

> 2. Make the table an ordered_set; mnesia:dirty_prev(T, mnesia:dirty_next(T, K)) and check if K is returned
>   Pro: avoids the data copy
>   Con: requires an ordered_set, requires code to handle boundary conditions wrt '$end_of_table’

Using UUID as primary key, the ordered_set might eventually slow down my writes.

> 3. Store the keys w/o data in a separate table, then do a dirty_read in that
>   Pro: reduces copying 
>   Con: requires more storage, the lookup in the side table won't provide cache hints to help your access
>        in the main table (but that may be Ok if the side table is hit orders of magnitude more often)
> One could implement some sort of sparse bitmap or range tree and use that to record key presence, but I'm
> not sure it would be worthwhile in Erlang.

Yes, I am looking into this possibility as Eric also has suggested the same approach. I can think of using bitmap if it doesn’t complicate the solution beyond the performance again.

Also, when I was going through my use case, I figured out the chance of a table being remote is rare enough to make peace with ets:member and tried to implement as shown below - 

is_key(Tname, Key) ->
  case catch ets:member(Tname, Key) of
    {'EXIT', _Reason} ->
      is_remote_key(Tname, Key);
    Boolean -> Boolean

is_remote_key(Tname, Key) ->
  case mnesia:dirty_read(Tname, Key) of
    [] -> false;
    _ -> true

are_all_keys(Tname, Keys) ->
  Fun = case mnesia:table_info(Tname, storage_type) of
          unknown -> fun is_remote_key/2;
          _ -> fun is_key/2
  are_all_keys(Tname, Keys, Fun).

are_all_keys(_Tname, [], _Fun) -> true;
are_all_keys(Tname, [Key|Keys], Fun) ->
  case Fun(Tname, Key) of
    false -> false;
    true -> are_all_keys(Tname, Keys, Fun)

Below are the latencies when checked with timer:tc. 

*** Table has a local copy ***
13> timer:tc(mnesiaKeys, are_all_keys, [test, [1,2,3]]).
14> timer:tc(mnesiaKeys, are_all_keys, [test, [1,2,3,2,1,1,2,3]]).
15> timer:tc(mnesiaKeys, are_all_keys, [test, [1,2,3,2,3,1,2,3,1]]).
16> timer:tc(mnesiaKeys, are_all_keys, [test, [1,2,3,2,3,1,2,3,1]]).
17> timer:tc(mnesiaKeys, are_all_keys, [test, [1,2,3,2,3,1,2,3,1]]).
18> timer:tc(mnesiaKeys, are_all_keys, [test, [1,2,3,2,3,1,2,3,1]]).

*** Table is remote ***
9> timer:tc(mnesiaKeys, are_all_keys, [test, [1,2,3]]).
10> timer:tc(mnesiaKeys, are_all_keys, [test, [1,2,3,2,3,1,2,3,1]]).
11> timer:tc(mnesiaKeys, are_all_keys, [test, [1,2,3,2,3,1,2,3,1]]).
12> timer:tc(mnesiaKeys, are_all_keys, [test, [1,2,3,2,3,1,2,3,1]]).
13> timer:tc(mnesiaKeys, are_all_keys, [test, [1,2,3,2,3,1,2,3,1]]).

Though I didn’t use UUIDs in my example I think this is optimized enough. Please suggest otherwise. 


More information about the erlang-questions mailing list