Exercise: ETS with secondary indices

Sat Dec 7 18:54:48 CET 2019

All,
I'm doing a conceptual exercise to evaluate the usage of ETS with secondary
indices.

The exercise is really simple:

   1. There are elements which can belong to groups, in a m-to-n
   relationship (every element can belong to one or more groups).
   2. Elements and groups can be created/deleted at will.
   3. We need to find the groups an element belongs to.
   4. We need to find the elements included into a group.
   5. We need to not keep an element if it does not belong to a group, nor
   a group if it does not have elements in it (this is to avoid memory leaks
   since groups / elements get added / removed).

All of these operations should be optimized as much as possible.

A. One possibility is to have 2 ETS tables of type set, that contain the
following terms:

1. table elements with tuple format {Element, Groups}
2. table groups with tuple format {Group, Elements}

Groups and Elements would be maps (so that finding an element of the map
does not require traversing an array).

Adding an Element to a Group would mean:

   - An insert in the table elements where the Group gets added to the
   Groups map (bonus of using a map: no duplicates can be created).
   - An insert in the table groups where the Element gets added to the
   Elements map.

Retrieving the groups an element belongs to is simple as getting the Element
in the table elements, the groups will be keys of the Groups map.
Retrieving the elements a group contains to is simple as getting the Group
in the table groups, the elements will be keys of the Elements map.

Deleting is far from being optimized though. For instance, deleting an
Element means:

   - Getting the Groups it belongs to with a lookup in the table elements.
   - For every Group in the Groups map, remove the Element from the Elements
   map.

It becomes something like this:

case ets:lookup(elements, Element) of
    [{Element, Groups}] ->
        ets:delete(elements, Element),
        lists:foreach(fun({Group}) ->
            case ets:lookup(groups, Group) of
                [{Group, Elements}] ->
                    case maps:remove(Element, Elements) of
                        Elements1 when map_size(Elements1) == 0 ->
                            ets:delete(groups, Group);
                        Elements1 ->
                            ets:insert(groups, {Group, Elements1})
                    end;
                _ ->
                    ok
            end
        end, maps:keys(Groups))
    _ ->
        ok
end.

Ouch. Same goes for groups. And this is to delete *one* element, imagine if
bigger operations need to be made.

B. Another possibility would be to have 2 ETS tables of type bag, that
contain the following terms:

1. table elements with tuple format {Element, Group}
2. table groups with tuple format {Group, Element}

Adding an Element to a Group means:

   - An insert in the table elements of the tuple {Element, Group}.
   - An insert in the table groups of the tuple {Group, Element}.

Retrieving the groups an element belongs to requires an ets:match_object/2
or similar such as:
ets:match_object(elements, {Element, _ = '_'}).

Retrieving the elements a group belongs to requires an ets:match_object/2
or similar such as:
ets:match_object(groups, {Group, _ = '_'}).

So retrieving requires the traversing of a table, but it should be
relatively quick, given that the match is done on the index itself.

Deleting is not particularly optimized though, because it requires the
traversing of a table with elements that are not indexed. For instance,
deleting an Element would look something like:
ets:match_delete(elements, {Element, _ = '_'}). %% fast, indexed
ets:match_delete(groups, {_ = '_', Element}). %% not fast, table is being
traversed.

What would *you* do? Any takers are welcome ^^_

Cheers,
r.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20191207/bb2744d2/attachment.htm>