Exercise: ETS with secondary indices

Mon Dec 9 20:15:41 CET 2019

Here is a variant of your second solution. But instead insert each element-group 
pair as a key
and use ordered_set which has an optimization for traversal with partially bound
key.
ets:new(elements,[ordered_set,named_table]).
ets:new(groups,[ordered_set,named_table]).
% Insert element
[ets:insert(elements, {{Element,G}}) || G <- Groups],
[ets:insert(groups, {{G,Element}}) || G <- Groups],
% Lookup groups from element
Groups = ets:select(elements, [{{{Element,'$1'}}, [], ['$1']}]),
% Lookup elements from group
Elements = ets:select(groups, [{{{Group,'$1'}}, [], ['$1']}]),
% Delete element
Groups = ets:select(elements, [{{{Element,'$1'}}, [], ['$1']}]),
ets:match_delete(elements, {{Element, '_'}}),
[ets:delete(groups, {G,Element}) || G <- Groups],
All select and match traversals use a partially bound key and will
only search the part of the ordered_set (tree) that may contain such keys.
/Sverker
On lör, 2019-12-07 at 18:54 +0100, Roberto Ostinelli wrote:
> All,
> I'm doing a conceptual exercise to evaluate the usage of ETS with secondary
> indices.
> 
> The exercise is really simple:
> There are elements which can belong to groups, in a m-to-n relationship (every
> element can belong to one or more groups).
> Elements and groups can be created/deleted at will.
> We need to find the groups an element belongs to.
> We need to find the elements included into a group.
> We need to not keep an element if it does not belong to a group, nor a group
> if it does not have elements in it (this is to avoid memory leaks since groups
> / elements get added / removed).
> All of these operations should be optimized as much as possible.
> 
> 
> A. One possibility is to have 2 ETS tables of type set, that contain the
> following terms:
> 
> 1. table elements with tuple format {Element, Groups}
> 2. table groups with tuple format {Group, Elements}
> 
> Groups and Elements would be maps (so that finding an element of the map does
> not require traversing an array).
> 
> Adding an Element to a Group would mean:
> An insert in the table elements where the Group gets added to the Groups map
> (bonus of using a map: no duplicates can be created).
> An insert in the table groups where the Element gets added to the Elements
> map.
> Retrieving the groups an element belongs to is simple as getting the Element
> in the table elements, the groups will be keys of the Groups map.
> Retrieving the elements a group contains to is simple as getting the Group in
> the table groups, the elements will be keys of the Elements map.
> 
> Deleting is far from being optimized though. For instance, deleting an Element
> means:
> Getting the Groups it belongs to with a lookup in the table elements.
> For every Group in the Groups map, remove the Element from the Elements map.
> It becomes something like this:
> 
> case ets:lookup(elements, Element) of
>     [{Element, Groups}] ->
>         ets:delete(elements, Element),
>         lists:foreach(fun({Group}) ->
>             case ets:lookup(groups, Group) of
>                 [{Group, Elements}] ->
>                     case maps:remove(Element, Elements) of
>                         Elements1 when map_size(Elements1) == 0 ->
>                             ets:delete(groups, Group);
>                         Elements1 ->
>                             ets:insert(groups, {Group, Elements1})
>                     end;
>                 _ ->
>                     ok
>             end
>         end, maps:keys(Groups))
>     _ ->
>         ok
> end.
> 
> Ouch. Same goes for groups. And this is to delete one element, imagine if
> bigger operations need to be made.
> 
> 
> B. Another possibility would be to have 2 ETS tables of type bag, that contain
> the following terms:
> 
> 1. table elements with tuple format {Element, Group}
> 2. table groups with tuple format {Group, Element}
> 
> Adding an Element to a Group means:
> An insert in the table elements of the tuple {Element, Group}.
> An insert in the table groups of the tuple {Group, Element}.
> Retrieving the groups an element belongs to requires an ets:match_object/2 or
> similar such as:
> ets:match_object(elements, {Element, _ = '_'}).
> 
> Retrieving the elements a group belongs to requires an ets:match_object/2 or
> similar such as:
> ets:match_object(groups, {Group, _ = '_'}).
> 
> So retrieving requires the traversing of a table, but it should be relatively
> quick, given that the match is done on the index itself.
> 
> Deleting is not particularly optimized though, because it requires the
> traversing of a table with elements that are not indexed. For instance,
> deleting an Element would look something like:
> ets:match_delete(elements, {Element, _ = '_'}). %% fast, indexed
> ets:match_delete(groups, {_ = '_', Element}). %% not fast, table is being
> traversed.
> 
> 
> What would you do? Any takers are welcome ^^_
> 
> Cheers,
> r.
> 
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20191209/22faf9fd/attachment.htm>