[erlang-questions] Sets, ordsets, gb_sets, sofs. What is going on?

Fri Mar 26 13:20:11 CET 2010

2010/3/26 Björn Gustavsson <bgustavsson@REDACTED>

> On Thu, Mar 25, 2010 at 10:59 PM, Fred Hebert <mononcqc@REDACTED> wrote:
>
> If something is to be included in the documentation, it must be well
> researched and any recommendations should be based on benchmarks
> results (not on intuition or folklore).
>

Yes, of course. I would have thought there was some rationale behind having
the four modules - that people would not reinvent the wheel for the sake of
it. This is partially what I am after. I'm not advocating putting unverified
claims in the documentation -- if there were any author associated with
these modules, I'd have rather contacted them directly to know the details.
Maybe I should have taken a look at the commit logs.

>
> Here are some rules of thumb based on my own experience and
> measurements of various operations in the Wings3D application. Note
> that the type of the data and the size of the process heap can influence
> the results, so if you are writing a time-critical application, you should
> do
> your own measurements.
>
> Anyway, my recommendation is to use the gb_sets module in most
> circumstances. gb_sets has the following properties:
>
> 1. Creating a gb_set from a list is very fast, as is converting a
> gb_set to to a sorted list. That means that if you need to process
> the set using sofs or your own code, you don't have to be afraid
> of converting between representations.
>
> 2. Incrementally constructing a gb_set by adding one element
> at the time is also fast (but not as fast doing gb_sets:from_list/1).
> As far as I know (I haven't done measurements), incrementally
> building up a gb_set is faster than incrementally building a
> 'sets' set.
>

Thanks for the reply. It sounds pretty much like the differences I
benchmarked  between gb_trees and the dict module (the comments mention
they're based on the same respective algorithms, so it makes sense) -- dicts
are faster on reads, while gb_trees are faster on most other operations.

>
> The ordsets module is useful if you'll need a known representation
> that you can process with your own code without conversion and/or
> the sets are very small.
>
Oh, good to know. I tend to treat most datatypes as opaque. I guess it is
wanted that ordsets makes it easy to operate with lists and whatnot.

>
> If your set must use the '=:=' operator, 'sets' is the only module
> you can use. Also, test for membership will probably be faster
> with 'sets' if the set has many elements. But note that
> sets:from_list/1 is not any faster than incrementally
> inserting one element at a time into an initially empty set,
> and is much slower than gb_sets:from_list/1.
>
> --
> Björn Gustavsson, Erlang/OTP, Ericsson AB
>

Thanks for your reply, much appreciated.