[erlang-questions] Sets, ordsets, gb_sets, sofs. What is going on?

Björn Gustavsson bgustavsson@REDACTED
Fri Mar 26 11:31:08 CET 2010


On Thu, Mar 25, 2010 at 10:59 PM, Fred Hebert <mononcqc@REDACTED> wrote:

> This raises the question about what module to use, and under what
> circumstances. Obviously, sofs is to be used whenever 'advanced' set
> operations are required. However, it is not clear why ordsets should be used
> over this one whenever the sets are small -- except if you plan to later
> move on to either sets or gb_sets or when you need '=:=' as a comparison
> function.
>
> Then I'm not sure what's the point of using 'sets' over gb_sets (or the
> opposite). The gb_sets implementation has a bunch more functions available
> than sets, but that's about it. I suppose performance must be relatively
> close between both modules, as it is between gb_trees and dicts. It is also
> mentioned that gb_sets compare badly to ordsets when operating on many sets
> of almost equal size, but otherwise do a much better job.
>
> Is there any way to know what I should suggest to readers [or use myself],
> depending on the use cases?
>
> I think this lack of comparisons is something that would be greatly
> appreciated in Erlang's documentation (if there is anything about that
> already, I'll be glad to read it!) I got the same kind of questions relative
> to dicts and gb_trees, although there are fewer modules to compare in this
> case.

If something is to be included in the documentation, it must be well
researched and any recommendations should be based on benchmarks
results (not on intuition or folklore).

Here are some rules of thumb based on my own experience and
measurements of various operations in the Wings3D application. Note
that the type of the data and the size of the process heap can influence
the results, so if you are writing a time-critical application, you should do
your own measurements.

Anyway, my recommendation is to use the gb_sets module in most
circumstances. gb_sets has the following properties:

1. Creating a gb_set from a list is very fast, as is converting a
gb_set to to a sorted list. That means that if you need to process
the set using sofs or your own code, you don't have to be afraid
of converting between representations.

2. Incrementally constructing a gb_set by adding one element
at the time is also fast (but not as fast doing gb_sets:from_list/1).
As far as I know (I haven't done measurements), incrementally
building up a gb_set is faster than incrementally building a
'sets' set.

The ordsets module is useful if you'll need a known representation
that you can process with your own code without conversion and/or
the sets are very small.

If your set must use the '=:=' operator, 'sets' is the only module
you can use. Also, test for membership will probably be faster
with 'sets' if the set has many elements. But note that
sets:from_list/1 is not any faster than incrementally
inserting one element at a time into an initially empty set,
and is much slower than gb_sets:from_list/1.

-- 
Björn Gustavsson, Erlang/OTP, Ericsson AB


More information about the erlang-questions mailing list