[erlang-questions] Sets, ordsets, gb_sets, sofs. What is going on?

Robert Virding rvirding@REDACTED
Fri Mar 26 17:48:05 CET 2010


The original idea behind having different implementations of sets was
the acceptance of that no implementation is "best" for everything and
everyone. We would therefore define an API and a number of modules
which followed this API but were based on different algorithms so
users could easily experiment and find which was best for their app.
We did two initial implementations: sets which uses hashing algorithm
(the same algorithm as ets actually) but whose structure is otherwise
undefined and ordsets which uses the defined ordered list.

It was never meant that these were to be the only set modules, but
that these were a base upon which others could be added and by having
the same API it would be easier for users to choose. It was also never
meant that the API was cast in stone and could not be extended with
more useful functions, the only requirement being that they work on
all implementations. Gb_sets provide an alternative tree based
algorithm which for many apps will be the best. While it does provide
a sets compatible API it is unfortunate that it seems like an
afterthought and not the base set of functions.

While ordsets is usually slower it does have one benefit in that it is
easy to read when printed which is definitely not the case for any
others. I often find that unless the set is big and the set operations
critical then the difference in speed is less noticeable than you
would expect.

The discussions apply for dict's as well but here it is even more
unfortunate that gb_trees does have dict compatible API.

The discussion about using or not using '=:=' operator for comparison
is an unfortunate by-product of Erlang not having a complete set of
pure term comparison operators which do not convert integers. With
20-20 hindsight it would have been much better to have a set of proper
term comparison operators and a separate set of numerical comparison
operators which only work on numbers. Then there would be no
difference.

I will be submitting (real soon now) a patch which slightly extends
the sets/dict APIs and adds compatible modules which use another
tree-based algorithm (red-black trees) but which are API compatible.
Just to confuse the issue.

That was a bit about the thoughts behind the dict/sets modules.

Robert

P.S. I have another question. How do we handle/where do we put code
which may not be directly useful but would be good as a base for user
specialisation? As an example I have another tree based implementation
of dict/sets which uses the same algorithm as in gb_trees but
expressed in a simpler way. It is almost as good as the red-black
trees but much simpler and would be a good base for someone who needs
a specialised dict or sets module.


More information about the erlang-questions mailing list