[erlang-questions] Better way to check if a set of keys exists in a mnesia table?
Chaitanya Chalasani
cchalasani@REDACTED
Sun Jul 17 06:47:02 CEST 2016
On 16-Jul-2016, at 19:56, Mikael Pettersson <mikpelinux@REDACTED> wrote:
> all_keys can be horribly expensive and should be avoided if possible, but for small tables it may be acceptable.
>
> I'd do one of the following:
>
> 1. mnesia:dirty_read(T, K) and check result for [] vs [_|_]
> Pro: easy, works
> Con: the data copy may be expensive for large records
Yes Indeed.
>
> 2. Make the table an ordered_set; mnesia:dirty_prev(T, mnesia:dirty_next(T, K)) and check if K is returned
> Pro: avoids the data copy
> Con: requires an ordered_set, requires code to handle boundary conditions wrt '$end_of_table’
Using UUID as primary key, the ordered_set might eventually slow down my writes.
>
> 3. Store the keys w/o data in a separate table, then do a dirty_read in that
> Pro: reduces copying
> Con: requires more storage, the lookup in the side table won't provide cache hints to help your access
> in the main table (but that may be Ok if the side table is hit orders of magnitude more often)
>
> One could implement some sort of sparse bitmap or range tree and use that to record key presence, but I'm
> not sure it would be worthwhile in Erlang.
Yes, I am looking into this possibility as Eric also has suggested the same approach. I can think of using bitmap if it doesn’t complicate the solution beyond the performance again.
Also, when I was going through my use case, I figured out the chance of a table being remote is rare enough to make peace with ets:member and tried to implement as shown below -
is_key(Tname, Key) ->
case catch ets:member(Tname, Key) of
{'EXIT', _Reason} ->
is_remote_key(Tname, Key);
Boolean -> Boolean
end.
is_remote_key(Tname, Key) ->
case mnesia:dirty_read(Tname, Key) of
[] -> false;
_ -> true
end.
are_all_keys(Tname, Keys) ->
Fun = case mnesia:table_info(Tname, storage_type) of
unknown -> fun is_remote_key/2;
_ -> fun is_key/2
end,
are_all_keys(Tname, Keys, Fun).
are_all_keys(_Tname, [], _Fun) -> true;
are_all_keys(Tname, [Key|Keys], Fun) ->
case Fun(Tname, Key) of
false -> false;
true -> are_all_keys(Tname, Keys, Fun)
end.
Below are the latencies when checked with timer:tc.
*** Table has a local copy ***
13> timer:tc(mnesiaKeys, are_all_keys, [test, [1,2,3]]).
{11,true}
14> timer:tc(mnesiaKeys, are_all_keys, [test, [1,2,3,2,1,1,2,3]]).
{14,true}
15> timer:tc(mnesiaKeys, are_all_keys, [test, [1,2,3,2,3,1,2,3,1]]).
{14,true}
16> timer:tc(mnesiaKeys, are_all_keys, [test, [1,2,3,2,3,1,2,3,1]]).
{13,true}
17> timer:tc(mnesiaKeys, are_all_keys, [test, [1,2,3,2,3,1,2,3,1]]).
{13,true}
18> timer:tc(mnesiaKeys, are_all_keys, [test, [1,2,3,2,3,1,2,3,1]]).
{14,true}
*** Table is remote ***
9> timer:tc(mnesiaKeys, are_all_keys, [test, [1,2,3]]).
{975,true}
10> timer:tc(mnesiaKeys, are_all_keys, [test, [1,2,3,2,3,1,2,3,1]]).
{2151,true}
11> timer:tc(mnesiaKeys, are_all_keys, [test, [1,2,3,2,3,1,2,3,1]]).
{2003,true}
12> timer:tc(mnesiaKeys, are_all_keys, [test, [1,2,3,2,3,1,2,3,1]]).
{1898,true}
13> timer:tc(mnesiaKeys, are_all_keys, [test, [1,2,3,2,3,1,2,3,1]]).
{2027,true}
Though I didn’t use UUIDs in my example I think this is optimized enough. Please suggest otherwise.
/Chaitanya
More information about the erlang-questions
mailing list