Exercise: ETS with secondary indices
Roberto Ostinelli
ostinelli@REDACTED
Sun Dec 15 18:52:54 CET 2019
Thank you Sverker!
On Wed, Dec 11, 2019 at 3:06 PM Sverker Eriksson <sverker@REDACTED> wrote:
> The ordered_set optimization *traversal with partially bound key*
> does only trigger on the MatchHead tuple and not the MatchConditions list.
>
> So both the select calls will have the same result, but the second one
> will traverse the entire search tree.
>
>
> With a partially bound key like {Name, '_'} the total "lookup time" is
> roughly the same as doing one lookup per matching key.
>
> With a partially bound key like {Name, '_', Other} the the total lookup
> time may be longer
> as it have to visit all keys matching {Name, '_', '_'} and check if Other
> also matches.
>
> /Sverker
>
> On tis, 2019-12-10 at 20:26 +0100, Roberto Ostinelli wrote:
>
> Related, is there any difference in terms of performance in this case (
> ordered_set) between:
>
> ets:select(groups, [{
> {{Name, '$2'}, '_', '_', '_'},
> [],
> ['$2']
> }])
>
> and
>
> ets:select(groups, [{
> {{$1, '$2'}, '_', '_', '_'},
> [{'=:=', '$1', Name}],
> ['$2']
> }])
>
> On Tue, Dec 10, 2019 at 12:34 PM Roberto Ostinelli <ostinelli@REDACTED>
> wrote:
>
> Sverker,
> This looks very nice. I was considering using composite keys but I didn't
> know that partial lookups would work.
>
> Seems definitely cleaner, and if I understand correctly the lookup time
> will be the same as with a single key, with the added benefit of the delete
> functionalities (please correct me if I'm wrong).
>
> Thank you so much,
> r.
>
>
> On Mon, Dec 9, 2019 at 8:15 PM Sverker Eriksson <sverker@REDACTED>
> wrote:
>
> Here is a variant of your second solution. But instead insert each
> element-group pair as a *key*
> and use ordered_set which has an optimization for *traversal with
> partially bound key*.
>
>
> ets:new(elements,[ordered_set,named_table]).
> ets:new(groups,[ordered_set,named_table]).
>
> % Insert element
> [ets:insert(elements, {{Element,G}}) || G <- Groups],
> [ets:insert(groups, {{G,Element}}) || G <- Groups],
>
>
> % Lookup groups from element
> Groups = ets:select(elements, [{{{Element,'$1'}}, [], ['$1']}]),
>
> % Lookup elements from group
> Elements = ets:select(groups, [{{{Group,'$1'}}, [], ['$1']}]),
>
> % Delete element
> Groups = ets:select(elements, [{{{Element,'$1'}}, [], ['$1']}]),
> ets:match_delete(elements, {{Element, '_'}}),
> [ets:delete(groups, {G,Element}) || G <- Groups],
>
>
> All select and match traversals use a partially bound key and will
> only search the part of the ordered_set (tree) that may contain such keys.
>
>
> /Sverker
>
> On lör, 2019-12-07 at 18:54 +0100, Roberto Ostinelli wrote:
>
> All,
> I'm doing a conceptual exercise to evaluate the usage of ETS with
> secondary indices.
>
> The exercise is really simple:
>
> 1. There are elements which can belong to groups, in a m-to-n
> relationship (every element can belong to one or more groups).
> 2. Elements and groups can be created/deleted at will.
> 3. We need to find the groups an element belongs to.
> 4. We need to find the elements included into a group.
> 5. We need to not keep an element if it does not belong to a group,
> nor a group if it does not have elements in it (this is to avoid memory
> leaks since groups / elements get added / removed).
>
> All of these operations should be optimized as much as possible.
>
>
> A. One possibility is to have 2 ETS tables of type set, that contain the
> following terms:
>
> 1. table elements with tuple format {Element, Groups}
> 2. table groups with tuple format {Group, Elements}
>
> Groups and Elements would be maps (so that finding an element of the map
> does not require traversing an array).
>
> Adding an Element to a Group would mean:
>
> - An insert in the table elements where the Group gets added to the
> Groups map (bonus of using a map: no duplicates can be created).
> - An insert in the table groups where the Element gets added to the
> Elements map.
>
> Retrieving the groups an element belongs to is simple as getting the
> Element in the table elements, the groups will be keys of the Groups map.
> Retrieving the elements a group contains to is simple as getting the Group
> in the table groups, the elements will be keys of the Elements map.
>
> Deleting is far from being optimized though. For instance, deleting an
> Element means:
>
> - Getting the Groups it belongs to with a lookup in the table elements.
> - For every Group in the Groups map, remove the Element from the
> Elements map.
>
> It becomes something like this:
>
> case ets:lookup(elements, Element) of
> [{Element, Groups}] ->
> ets:delete(elements, Element),
> lists:foreach(fun({Group}) ->
> case ets:lookup(groups, Group) of
> [{Group, Elements}] ->
> case maps:remove(Element, Elements) of
> Elements1 when map_size(Elements1) == 0 ->
> ets:delete(groups, Group);
> Elements1 ->
> ets:insert(groups, {Group, Elements1})
> end;
> _ ->
> ok
> end
> end, maps:keys(Groups))
> _ ->
> ok
> end.
>
> Ouch. Same goes for groups. And this is to delete *one* element, imagine
> if bigger operations need to be made.
>
>
> B. Another possibility would be to have 2 ETS tables of type bag, that
> contain the following terms:
>
> 1. table elements with tuple format {Element, Group}
> 2. table groups with tuple format {Group, Element}
>
> Adding an Element to a Group means:
>
> - An insert in the table elements of the tuple {Element, Group}.
> - An insert in the table groups of the tuple {Group, Element}.
>
> Retrieving the groups an element belongs to requires an ets:match_object/2
> or similar such as:
> ets:match_object(elements, {Element, _ = '_'}).
>
> Retrieving the elements a group belongs to requires an ets:match_object/2
> or similar such as:
> ets:match_object(groups, {Group, _ = '_'}).
>
> So retrieving requires the traversing of a table, but it should be
> relatively quick, given that the match is done on the index itself.
>
> Deleting is not particularly optimized though, because it requires the
> traversing of a table with elements that are not indexed. For instance,
> deleting an Element would look something like:
> ets:match_delete(elements, {Element, _ = '_'}). %% fast, indexed
> ets:match_delete(groups, {_ = '_', Element}). %% not fast, table is being
> traversed.
>
>
> What would *you* do? Any takers are welcome ^^_
>
> Cheers,
> r.
>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20191215/880b19d5/attachment.htm>
More information about the erlang-questions
mailing list