Exercise: ETS with secondary indices

Roberto Ostinelli ostinelli@REDACTED
Sun Dec 15 18:52:54 CET 2019


Thank you Sverker!

On Wed, Dec 11, 2019 at 3:06 PM Sverker Eriksson <sverker@REDACTED> wrote:

> The ordered_set optimization *traversal with partially bound key*
> does only trigger on the MatchHead tuple and not the MatchConditions list.
>
> So both the select calls will have the same result, but the second one
> will traverse the entire search tree.
>
>
> With a partially bound key like {Name, '_'} the total "lookup time" is
> roughly  the same as doing one lookup per matching key.
>
> With a partially bound key like {Name, '_', Other} the the total lookup
> time may be longer
> as it have to visit all keys matching {Name, '_', '_'} and check if Other
> also matches.
>
> /Sverker
>
> On tis, 2019-12-10 at 20:26 +0100, Roberto Ostinelli wrote:
>
> Related, is there any difference in terms of performance in this case (
> ordered_set) between:
>
> ets:select(groups, [{
>     {{Name, '$2'}, '_', '_', '_'},
>     [],
>     ['$2']
> }])
>
> and
>
> ets:select(groups, [{
>     {{$1, '$2'}, '_', '_', '_'},
>     [{'=:=', '$1', Name}],
>     ['$2']
> }])
>
> On Tue, Dec 10, 2019 at 12:34 PM Roberto Ostinelli <ostinelli@REDACTED>
> wrote:
>
> Sverker,
> This looks very nice. I was considering using composite keys but I didn't
> know that partial lookups would work.
>
> Seems definitely cleaner, and if I understand correctly the lookup time
> will be the same as with a single key, with the added benefit of the delete
> functionalities (please correct me if I'm wrong).
>
> Thank you so much,
> r.
>
>
> On Mon, Dec 9, 2019 at 8:15 PM Sverker Eriksson <sverker@REDACTED>
> wrote:
>
> Here is a variant of your second solution. But instead insert each
> element-group pair as a *key*
> and use ordered_set which has an optimization for *traversal with
> partially bound key*.
>
>
> ets:new(elements,[ordered_set,named_table]).
> ets:new(groups,[ordered_set,named_table]).
>
> % Insert element
> [ets:insert(elements, {{Element,G}}) || G <- Groups],
> [ets:insert(groups, {{G,Element}}) || G <- Groups],
>
>
> % Lookup groups from element
> Groups = ets:select(elements, [{{{Element,'$1'}}, [], ['$1']}]),
>
> % Lookup elements from group
> Elements = ets:select(groups, [{{{Group,'$1'}}, [], ['$1']}]),
>
> % Delete element
> Groups = ets:select(elements, [{{{Element,'$1'}}, [], ['$1']}]),
> ets:match_delete(elements, {{Element, '_'}}),
> [ets:delete(groups, {G,Element}) || G <- Groups],
>
>
> All select and match traversals use a partially bound key and will
> only search the part of the ordered_set (tree) that may contain such keys.
>
>
> /Sverker
>
> On lör, 2019-12-07 at 18:54 +0100, Roberto Ostinelli wrote:
>
> All,
> I'm doing a conceptual exercise to evaluate the usage of ETS with
> secondary indices.
>
> The exercise is really simple:
>
>    1. There are elements which can belong to groups, in a m-to-n
>    relationship (every element can belong to one or more groups).
>    2. Elements and groups can be created/deleted at will.
>    3. We need to find the groups an element belongs to.
>    4. We need to find the elements included into a group.
>    5. We need to not keep an element if it does not belong to a group,
>    nor a group if it does not have elements in it (this is to avoid memory
>    leaks since groups / elements get added / removed).
>
> All of these operations should be optimized as much as possible.
>
>
> A. One possibility is to have 2 ETS tables of type set, that contain the
> following terms:
>
> 1. table elements with tuple format {Element, Groups}
> 2. table groups with tuple format {Group, Elements}
>
> Groups and Elements would be maps (so that finding an element of the map
> does not require traversing an array).
>
> Adding an Element to a Group would mean:
>
>    - An insert in the table elements where the Group gets added to the
>    Groups map (bonus of using a map: no duplicates can be created).
>    - An insert in the table groups where the Element gets added to the
>    Elements map.
>
> Retrieving the groups an element belongs to is simple as getting the
> Element in the table elements, the groups will be keys of the Groups map.
> Retrieving the elements a group contains to is simple as getting the Group
> in the table groups, the elements will be keys of the Elements map.
>
> Deleting is far from being optimized though. For instance, deleting an
> Element means:
>
>    - Getting the Groups it belongs to with a lookup in the table elements.
>    - For every Group in the Groups map, remove the Element from the
>    Elements map.
>
> It becomes something like this:
>
> case ets:lookup(elements, Element) of
>     [{Element, Groups}] ->
>         ets:delete(elements, Element),
>         lists:foreach(fun({Group}) ->
>             case ets:lookup(groups, Group) of
>                 [{Group, Elements}] ->
>                     case maps:remove(Element, Elements) of
>                         Elements1 when map_size(Elements1) == 0 ->
>                             ets:delete(groups, Group);
>                         Elements1 ->
>                             ets:insert(groups, {Group, Elements1})
>                     end;
>                 _ ->
>                     ok
>             end
>         end, maps:keys(Groups))
>     _ ->
>         ok
> end.
>
> Ouch. Same goes for groups. And this is to delete *one* element, imagine
> if bigger operations need to be made.
>
>
> B. Another possibility would be to have 2 ETS tables of type bag, that
> contain the following terms:
>
> 1. table elements with tuple format {Element, Group}
> 2. table groups with tuple format {Group, Element}
>
> Adding an Element to a Group means:
>
>    - An insert in the table elements of the tuple {Element, Group}.
>    - An insert in the table groups of the tuple {Group, Element}.
>
> Retrieving the groups an element belongs to requires an ets:match_object/2
> or similar such as:
> ets:match_object(elements, {Element, _ = '_'}).
>
> Retrieving the elements a group belongs to requires an ets:match_object/2
> or similar such as:
> ets:match_object(groups, {Group, _ = '_'}).
>
> So retrieving requires the traversing of a table, but it should be
> relatively quick, given that the match is done on the index itself.
>
> Deleting is not particularly optimized though, because it requires the
> traversing of a table with elements that are not indexed. For instance,
> deleting an Element would look something like:
> ets:match_delete(elements, {Element, _ = '_'}). %% fast, indexed
> ets:match_delete(groups, {_ = '_', Element}). %% not fast, table is being
> traversed.
>
>
> What would *you* do? Any takers are welcome ^^_
>
> Cheers,
> r.
>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20191215/880b19d5/attachment.htm>


More information about the erlang-questions mailing list