[erlang-questions] Better way to check if a set of keys exists in a mnesia table?

Sat Jul 16 16:26:44 CEST 2016

Chaitanya Chalasani writes:
 > I have a table with an UUID as the primary key / first element of the record. 
 > 
 > What is the efficient way to check if a given set of UUIDs are valid primary key for that table. 
 > 
 > I can think for three different solutions -
 > use mnesia:all_leys(TableName) and perform lists subset check. However, if the table contains over a million records, fetching all the keys for every check isn’t a nice solution.
 > use mnesia:read(TableName, Key) and check on the response. However, if the row is a big enough, trying to get the whole row for a simple key check isn’t that good either. 
 > use ets:member(TableName, Key). A better solution than the above but doesn’t work on remote tables. 
 > 
 > Which one of the above is the least bad solution or is there a better one hidden under the documents. 

all_keys can be horribly expensive and should be avoided if possible, but for small tables it may be acceptable.

I'd do one of the following:

1. mnesia:dirty_read(T, K) and check result for [] vs [_|_]
   Pro: easy, works
   Con: the data copy may be expensive for large records

2. Make the table an ordered_set; mnesia:dirty_prev(T, mnesia:dirty_next(T, K)) and check if K is returned
   Pro: avoids the data copy
   Con: requires an ordered_set, requires code to handle boundary conditions wrt '$end_of_table'

3. Store the keys w/o data in a separate table, then do a dirty_read in that
   Pro: reduces copying 
   Con: requires more storage, the lookup in the side table won't provide cache hints to help your access
        in the main table (but that may be Ok if the side table is hit orders of magnitude more often)

One could implement some sort of sparse bitmap or range tree and use that to record key presence, but I'm
not sure it would be worthwhile in Erlang.