[erlang-questions] Better way to check if a set of keys exists in a mnesia table?
Bernard Duggan
bernard@REDACTED
Mon Jul 18 01:56:44 CEST 2016
[Sorry for the duplicate, Chaitanya - I meant to reply to the list]
I may be missing something, but what's wrong with mneisa:[dirty_]select/2?
mnesia:dirty_select(my_table, [{#my_record{uuid=UUID, _='_'}, [], [true]}])
Pro: Works on any table type, no large row copying, no extra storage,
automatically uses index if available.
Cons: Only about 3 people on the planet can write matchspecs off the top of
their head.
will return [true] where the UUID is present, and [] otherwise. No issue
with big rows, nor with lots of entries.
If you're feeling really adventurous, you can even do all the UUIDs in one
call:
mnesia:dirty_select(my_table, [{#my_record{uuid=UUID, _='_'}, [], [true]}
|| UUID <- MyListOfUUIDs])
If the reuslt is [], none were present. If it's equal in length to
MyListOfUUIDs then they were all present.
B
On Sun, Jul 17, 2016 at 2:47 PM, Chaitanya Chalasani <cchalasani@REDACTED>
wrote:
> On 16-Jul-2016, at 19:56, Mikael Pettersson <mikpelinux@REDACTED> wrote:
> > all_keys can be horribly expensive and should be avoided if possible,
> but for small tables it may be acceptable.
> >
> > I'd do one of the following:
> >
> > 1. mnesia:dirty_read(T, K) and check result for [] vs [_|_]
> > Pro: easy, works
> > Con: the data copy may be expensive for large records
>
> Yes Indeed.
>
> >
> > 2. Make the table an ordered_set; mnesia:dirty_prev(T,
> mnesia:dirty_next(T, K)) and check if K is returned
> > Pro: avoids the data copy
> > Con: requires an ordered_set, requires code to handle boundary
> conditions wrt '$end_of_table’
>
> Using UUID as primary key, the ordered_set might eventually slow down my
> writes.
>
> >
> > 3. Store the keys w/o data in a separate table, then do a dirty_read in
> that
> > Pro: reduces copying
> > Con: requires more storage, the lookup in the side table won't provide
> cache hints to help your access
> > in the main table (but that may be Ok if the side table is hit
> orders of magnitude more often)
> >
> > One could implement some sort of sparse bitmap or range tree and use
> that to record key presence, but I'm
> > not sure it would be worthwhile in Erlang.
>
> Yes, I am looking into this possibility as Eric also has suggested the
> same approach. I can think of using bitmap if it doesn’t complicate the
> solution beyond the performance again.
>
> Also, when I was going through my use case, I figured out the chance of a
> table being remote is rare enough to make peace with ets:member and tried
> to implement as shown below -
>
> is_key(Tname, Key) ->
> case catch ets:member(Tname, Key) of
> {'EXIT', _Reason} ->
> is_remote_key(Tname, Key);
> Boolean -> Boolean
> end.
>
> is_remote_key(Tname, Key) ->
> case mnesia:dirty_read(Tname, Key) of
> [] -> false;
> _ -> true
> end.
>
> are_all_keys(Tname, Keys) ->
> Fun = case mnesia:table_info(Tname, storage_type) of
> unknown -> fun is_remote_key/2;
> _ -> fun is_key/2
> end,
> are_all_keys(Tname, Keys, Fun).
>
> are_all_keys(_Tname, [], _Fun) -> true;
> are_all_keys(Tname, [Key|Keys], Fun) ->
> case Fun(Tname, Key) of
> false -> false;
> true -> are_all_keys(Tname, Keys, Fun)
> end.
>
> Below are the latencies when checked with timer:tc.
>
> *** Table has a local copy ***
> 13> timer:tc(mnesiaKeys, are_all_keys, [test, [1,2,3]]).
> {11,true}
> 14> timer:tc(mnesiaKeys, are_all_keys, [test, [1,2,3,2,1,1,2,3]]).
> {14,true}
> 15> timer:tc(mnesiaKeys, are_all_keys, [test, [1,2,3,2,3,1,2,3,1]]).
> {14,true}
> 16> timer:tc(mnesiaKeys, are_all_keys, [test, [1,2,3,2,3,1,2,3,1]]).
> {13,true}
> 17> timer:tc(mnesiaKeys, are_all_keys, [test, [1,2,3,2,3,1,2,3,1]]).
> {13,true}
> 18> timer:tc(mnesiaKeys, are_all_keys, [test, [1,2,3,2,3,1,2,3,1]]).
> {14,true}
>
> *** Table is remote ***
> 9> timer:tc(mnesiaKeys, are_all_keys, [test, [1,2,3]]).
> {975,true}
> 10> timer:tc(mnesiaKeys, are_all_keys, [test, [1,2,3,2,3,1,2,3,1]]).
> {2151,true}
> 11> timer:tc(mnesiaKeys, are_all_keys, [test, [1,2,3,2,3,1,2,3,1]]).
> {2003,true}
> 12> timer:tc(mnesiaKeys, are_all_keys, [test, [1,2,3,2,3,1,2,3,1]]).
> {1898,true}
> 13> timer:tc(mnesiaKeys, are_all_keys, [test, [1,2,3,2,3,1,2,3,1]]).
> {2027,true}
>
> Though I didn’t use UUIDs in my example I think this is optimized enough.
> Please suggest otherwise.
>
>
> /Chaitanya
>
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20160718/9d31bb39/attachment.htm>
More information about the erlang-questions
mailing list