Exercise: ETS with secondary indices

Wed Dec 11 15:06:05 CET 2019

The ordered_set optimization traversal with partially bound key
does only trigger on the MatchHead tuple and not the MatchConditions list.
So both the select calls will have the same result, but the second one will
traverse the entire search tree.
With a partially bound key like {Name, '_'} the total "lookup time" is roughly
 the same as doing one lookup per matching key.
With a partially bound key like {Name, '_', Other} the the total lookup time may
be longer
as it have to visit all keys matching {Name, '_', '_'} and check if Other also
matches.
/Sverker
On tis, 2019-12-10 at 20:26 +0100, Roberto Ostinelli wrote:
> Related, is there any difference in terms of performance in this case
> (ordered_set) between:
> 
> ets:select(groups, [{
>     {{Name, '$2'}, '_', '_', '_'},
>     [],
>     ['$2']
> }])
> 
> and
> 
> ets:select(groups, [{
>     {{$1, '$2'}, '_', '_', '_'},
>     [{'=:=', '$1', Name}],
>     ['$2']
> }])
> 
> On Tue, Dec 10, 2019 at 12:34 PM Roberto Ostinelli <ostinelli@REDACTED>
> wrote:
> > Sverker,
> > This looks very nice. I was considering using composite keys but I didn't
> > know that partial lookups would work.
> > 
> > Seems definitely cleaner, and if I understand correctly the lookup time will
> > be the same as with a single key, with the added benefit of the delete
> > functionalities (please correct me if I'm wrong).
> > 
> > Thank you so much,
> > r.
> > 
> > 
> > On Mon, Dec 9, 2019 at 8:15 PM Sverker Eriksson <sverker@REDACTED> wrote:
> > > Here is a variant of your second solution. But instead insert each
> > > element-group pair as a key
> > > and use ordered_set which has an optimization for traversal with partially
> > > bound key.
> > > 
> > > 
> > > ets:new(elements,[ordered_set,named_table]).
> > > ets:new(groups,[ordered_set,named_table]).
> > > 
> > > % Insert element
> > > [ets:insert(elements, {{Element,G}}) || G <- Groups],
> > > [ets:insert(groups, {{G,Element}}) || G <- Groups],
> > > 
> > > 
> > > % Lookup groups from element
> > > Groups = ets:select(elements, [{{{Element,'$1'}}, [], ['$1']}]),
> > > 
> > > % Lookup elements from group
> > > Elements = ets:select(groups, [{{{Group,'$1'}}, [], ['$1']}]),
> > > 
> > > % Delete element
> > > Groups = ets:select(elements, [{{{Element,'$1'}}, [], ['$1']}]),
> > > ets:match_delete(elements, {{Element, '_'}}),
> > > [ets:delete(groups, {G,Element}) || G <- Groups],
> > > 
> > > 
> > > All select and match traversals use a partially bound key and will
> > > only search the part of the ordered_set (tree) that may contain such keys.
> > > 
> > > 
> > > /Sverker
> > > 
> > > On lör, 2019-12-07 at 18:54 +0100, Roberto Ostinelli wrote:
> > > > All,
> > > > I'm doing a conceptual exercise to evaluate the usage of ETS with
> > > > secondary indices.
> > > > 
> > > > The exercise is really simple:
> > > > There are elements which can belong to groups, in a m-to-n relationship
> > > > (every element can belong to one or more groups).
> > > > Elements and groups can be created/deleted at will.
> > > > We need to find the groups an element belongs to.
> > > > We need to find the elements included into a group.
> > > > We need to not keep an element if it does not belong to a group, nor a
> > > > group if it does not have elements in it (this is to avoid memory leaks
> > > > since groups / elements get added / removed).
> > > > All of these operations should be optimized as much as possible.
> > > > 
> > > > 
> > > > A. One possibility is to have 2 ETS tables of type set, that contain the
> > > > following terms:
> > > > 
> > > > 1. table elements with tuple format {Element, Groups}
> > > > 2. table groups with tuple format {Group, Elements}
> > > > 
> > > > Groups and Elements would be maps (so that finding an element of the map
> > > > does not require traversing an array).
> > > > 
> > > > Adding an Element to a Group would mean:
> > > > An insert in the table elements where the Group gets added to the Groups
> > > > map (bonus of using a map: no duplicates can be created).
> > > > An insert in the table groups where the Element gets added to the
> > > > Elements map.
> > > > Retrieving the groups an element belongs to is simple as getting the
> > > > Element in the table elements, the groups will be keys of the Groups
> > > > map.
> > > > Retrieving the elements a group contains to is simple as getting the
> > > > Group in the table groups, the elements will be keys of the Elements
> > > > map.
> > > > 
> > > > Deleting is far from being optimized though. For instance, deleting an
> > > > Element means:
> > > > Getting the Groups it belongs to with a lookup in the table elements.
> > > > For every Group in the Groups map, remove the Element from the Elements
> > > > map.
> > > > It becomes something like this:
> > > > 
> > > > case ets:lookup(elements, Element) of
> > > >     [{Element, Groups}] ->
> > > >         ets:delete(elements, Element),
> > > >         lists:foreach(fun({Group}) ->
> > > >             case ets:lookup(groups, Group) of
> > > >                 [{Group, Elements}] ->
> > > >                     case maps:remove(Element, Elements) of
> > > >                         Elements1 when map_size(Elements1) == 0 ->
> > > >                             ets:delete(groups, Group);
> > > >                         Elements1 ->
> > > >                             ets:insert(groups, {Group, Elements1})
> > > >                     end;
> > > >                 _ ->
> > > >                     ok
> > > >             end
> > > >         end, maps:keys(Groups))
> > > >     _ ->
> > > >         ok
> > > > end.
> > > > 
> > > > Ouch. Same goes for groups. And this is to delete one element, imagine
> > > > if bigger operations need to be made.
> > > > 
> > > > 
> > > > B. Another possibility would be to have 2 ETS tables of type bag, that
> > > > contain the following terms:
> > > > 
> > > > 1. table elements with tuple format {Element, Group}
> > > > 2. table groups with tuple format {Group, Element}
> > > > 
> > > > Adding an Element to a Group means:
> > > > An insert in the table elements of the tuple {Element, Group}.
> > > > An insert in the table groups of the tuple {Group, Element}.
> > > > Retrieving the groups an element belongs to requires an
> > > > ets:match_object/2 or similar such as:
> > > > ets:match_object(elements, {Element, _ = '_'}).
> > > > 
> > > > Retrieving the elements a group belongs to requires an
> > > > ets:match_object/2 or similar such as:
> > > > ets:match_object(groups, {Group, _ = '_'}).
> > > > 
> > > > So retrieving requires the traversing of a table, but it should be
> > > > relatively quick, given that the match is done on the index itself.
> > > > 
> > > > Deleting is not particularly optimized though, because it requires the
> > > > traversing of a table with elements that are not indexed. For instance,
> > > > deleting an Element would look something like:
> > > > ets:match_delete(elements, {Element, _ = '_'}). %% fast, indexed
> > > > ets:match_delete(groups, {_ = '_', Element}). %% not fast, table is
> > > > being traversed.
> > > > 
> > > > 
> > > > What would you do? Any takers are welcome ^^_
> > > > 
> > > > Cheers,
> > > > r.
> > > > 
> > > > 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20191211/13336f48/attachment.htm>