Sets, ordsets, gb_sets, sofs. What is going on?

Fred Hebert mononcqc@REDACTED
Thu Mar 25 22:59:08 CET 2010


I'm working on the next chapter of Learn You Some Erlang and I've decided to
describe common data structures (proplists, orddicts, dicts, gb_trees,
arrays, etc.) I wanted to discuss a few of the sets modules and then maybe
hint for digraph after that.

However, studying the different sets modules (sets, ordsets, gb_sets and
sofs), I'm starting to wonder what that is all about and what is the purpose
of each of them.

Here's what I got:

   - sets: module implemented on top of a structure similar to the one used
   in the dict module. Uses =:= as the comparison function;
   - ordsets: implemented as a sorted list. Uses == as the comparison
   function. Same interface as sets.
   - gb_sets: implemented over the gb_trees structure. Uses == as a
   comparison function. Same interface as sets and ordsets.
   - sofs: implemented as an ordered list with some meta data inside a
   tuple. Can also contain ordered sets (tuples or single values). Implements
   pretty much every standard set operation seen in textbooks, including
   relations, functions and whatnot. Seems to use '==' as a comparison
   function.

This raises the question about what module to use, and under what
circumstances. Obviously, sofs is to be used whenever 'advanced' set
operations are required. However, it is not clear why ordsets should be used
over this one whenever the sets are small -- except if you plan to later
move on to either sets or gb_sets or when you need '=:=' as a comparison
function.

Then I'm not sure what's the point of using 'sets' over gb_sets (or the
opposite). The gb_sets implementation has a bunch more functions available
than sets, but that's about it. I suppose performance must be relatively
close between both modules, as it is between gb_trees and dicts. It is also
mentioned that gb_sets compare badly to ordsets when operating on many sets
of almost equal size, but otherwise do a much better job.

Is there any way to know what I should suggest to readers [or use myself],
depending on the use cases?

I think this lack of comparisons is something that would be greatly
appreciated in Erlang's documentation (if there is anything about that
already, I'll be glad to read it!) I got the same kind of questions relative
to dicts and gb_trees, although there are fewer modules to compare in this
case.

Any help is appreciated.


More information about the erlang-questions mailing list