dict and set variants

Thu Dec 2 10:50:45 CET 2004

James Hague writes:

 > stdlib has three modules each for sets and dicts (key/value pairs):
 > 
 > sets: sets, ordsets, gb_sets
 > dicts: dict, orddict, gb_trees (and ets, I suppose :)
 > 
 > (BTW, notice the naming discrepancy.  Most are plural, some aren't.)
 > 
 > So, what's the rationale behind three versions of each?  

There isn't any, it's just various people letting code escape from
the lab before it got cleaned up and then writing some half-hearted
documentation before going on to something more interesting.

My favourite is this bit in the 'ordsets' documentation:

   "An ordered list is more efficient than an unordered list."

--------------------

Anyone care to extend and correct this table?

           implementation   insert  find    traverse:random/in-order
----------------------------------------------------------------------
   sets    hash?            1 ?     1 ?     N              /not-possible
ordsets    sorted list      N       N       N              /N
   list    list             1       N       N              /not-possible
gb_sets    tree             lg N    lg N    N ?            /N lg N
    ets    hash             1       1       N              /N

'N' means O(N), 'lg N' means O(log N), etc.

I haven't tried to indicate whether the bound is worst-case or
on-average because, in general, I don't know. I think a random
traverse of a gb_set is O(N), but the documentation doesn't actually
promise that, so maybe it's O(log N).

(Keep in mind that the overhead isn't shown above and that it often
does matter. There's little point using a tree when your set has ten
members.)

Matthias