[erlang-questions] Better way to check if a set of keys exists in a mnesia table?

Bernard Duggan bernard@REDACTED
Mon Jul 18 01:56:44 CEST 2016


[Sorry for the duplicate, Chaitanya - I meant to reply to the list]

I may be missing something, but what's wrong with mneisa:[dirty_]select/2?

mnesia:dirty_select(my_table, [{#my_record{uuid=UUID, _='_'}, [], [true]}])

Pro: Works on any table type, no large row copying, no extra storage,
automatically uses index if available.
Cons: Only about 3 people on the planet can write matchspecs off the top of
their head.

will return [true] where the UUID is present, and [] otherwise. No issue
with big rows, nor with lots of entries.

If you're feeling really adventurous, you can even do all the UUIDs in one
call:

mnesia:dirty_select(my_table, [{#my_record{uuid=UUID, _='_'}, [], [true]}
|| UUID <- MyListOfUUIDs])

If the reuslt is [], none were present. If it's equal in length to
MyListOfUUIDs then they were all present.

B

On Sun, Jul 17, 2016 at 2:47 PM, Chaitanya Chalasani <cchalasani@REDACTED>
wrote:

> On 16-Jul-2016, at 19:56, Mikael Pettersson <mikpelinux@REDACTED> wrote:
> > all_keys can be horribly expensive and should be avoided if possible,
> but for small tables it may be acceptable.
> >
> > I'd do one of the following:
> >
> > 1. mnesia:dirty_read(T, K) and check result for [] vs [_|_]
> >   Pro: easy, works
> >   Con: the data copy may be expensive for large records
>
> Yes Indeed.
>
> >
> > 2. Make the table an ordered_set; mnesia:dirty_prev(T,
> mnesia:dirty_next(T, K)) and check if K is returned
> >   Pro: avoids the data copy
> >   Con: requires an ordered_set, requires code to handle boundary
> conditions wrt '$end_of_table’
>
> Using UUID as primary key, the ordered_set might eventually slow down my
> writes.
>
> >
> > 3. Store the keys w/o data in a separate table, then do a dirty_read in
> that
> >   Pro: reduces copying
> >   Con: requires more storage, the lookup in the side table won't provide
> cache hints to help your access
> >        in the main table (but that may be Ok if the side table is hit
> orders of magnitude more often)
> >
> > One could implement some sort of sparse bitmap or range tree and use
> that to record key presence, but I'm
> > not sure it would be worthwhile in Erlang.
>
> Yes, I am looking into this possibility as Eric also has suggested the
> same approach. I can think of using bitmap if it doesn’t complicate the
> solution beyond the performance again.
>
> Also, when I was going through my use case, I figured out the chance of a
> table being remote is rare enough to make peace with ets:member and tried
> to implement as shown below -
>
> is_key(Tname, Key) ->
>   case catch ets:member(Tname, Key) of
>     {'EXIT', _Reason} ->
>       is_remote_key(Tname, Key);
>     Boolean -> Boolean
>   end.
>
> is_remote_key(Tname, Key) ->
>   case mnesia:dirty_read(Tname, Key) of
>     [] -> false;
>     _ -> true
>   end.
>
> are_all_keys(Tname, Keys) ->
>   Fun = case mnesia:table_info(Tname, storage_type) of
>           unknown -> fun is_remote_key/2;
>           _ -> fun is_key/2
>         end,
>   are_all_keys(Tname, Keys, Fun).
>
> are_all_keys(_Tname, [], _Fun) -> true;
> are_all_keys(Tname, [Key|Keys], Fun) ->
>   case Fun(Tname, Key) of
>     false -> false;
>     true -> are_all_keys(Tname, Keys, Fun)
>   end.
>
> Below are the latencies when checked with timer:tc.
>
> *** Table has a local copy ***
> 13> timer:tc(mnesiaKeys, are_all_keys, [test, [1,2,3]]).
> {11,true}
> 14> timer:tc(mnesiaKeys, are_all_keys, [test, [1,2,3,2,1,1,2,3]]).
> {14,true}
> 15> timer:tc(mnesiaKeys, are_all_keys, [test, [1,2,3,2,3,1,2,3,1]]).
> {14,true}
> 16> timer:tc(mnesiaKeys, are_all_keys, [test, [1,2,3,2,3,1,2,3,1]]).
> {13,true}
> 17> timer:tc(mnesiaKeys, are_all_keys, [test, [1,2,3,2,3,1,2,3,1]]).
> {13,true}
> 18> timer:tc(mnesiaKeys, are_all_keys, [test, [1,2,3,2,3,1,2,3,1]]).
> {14,true}
>
> *** Table is remote ***
> 9> timer:tc(mnesiaKeys, are_all_keys, [test, [1,2,3]]).
> {975,true}
> 10> timer:tc(mnesiaKeys, are_all_keys, [test, [1,2,3,2,3,1,2,3,1]]).
> {2151,true}
> 11> timer:tc(mnesiaKeys, are_all_keys, [test, [1,2,3,2,3,1,2,3,1]]).
> {2003,true}
> 12> timer:tc(mnesiaKeys, are_all_keys, [test, [1,2,3,2,3,1,2,3,1]]).
> {1898,true}
> 13> timer:tc(mnesiaKeys, are_all_keys, [test, [1,2,3,2,3,1,2,3,1]]).
> {2027,true}
>
> Though I didn’t use UUIDs in my example I think this is optimized enough.
> Please suggest otherwise.
>
>
> /Chaitanya
>
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20160718/9d31bb39/attachment.htm>


More information about the erlang-questions mailing list