[erlang-questions] Data structures are strange things

Sat Jun 9 02:03:19 CEST 2012

The flatter trees do work better, but the difference for lookup should be small as you end up doing the same comparisons as in a binary tree.  One problem I found was that code size grows as there become more cases to handle. Unless I can find/come up with some really smart compact handling then 2-3 trees seems to be a reasonable trade-off with slightly larger code but faster. Code grew with 2-3-4 trees as there was a new node representation to handle which is more complex than the other ones. So something smart is needed here.

Anyway there should soon be a tree alternative for the dict API.

Robert

----- Original Message -----
> "I don't know why the straight multi-size versions are faster but my
> guess is that the pattern-matching/data construction is faster on the
> flat bigger nodes rather than on the nested pairs...."
> 
> Just a guess here (but at least a guess informed by an obsession with
> search tree data structures, several decades ago):
> 
> Before you go looking for the effects of a reduction of space,
> remember that the conventional time complexity analysis for
> operations
> on such data structures is predicated on destructive updates. An
> O(log
> n) time complexity for delete or update represents the *worst case*.
> Some operations will result in no rebalancing at all, just the
> addition or removal of a node.
> 
> Without destructive updates, you always need to rebuild/rewrite tree
> nodes from the point of update up to the root. Even if nothing
> substantive changes along the way. It'll cost ya. In space, too.
> 
> How can you reduce these costs? By systematically reducing the
> distance to the root. You can do that by choosing a structure with a
> higher (average) branching factor. The higher the branching factor of
> your search structure, the shallower it will be, and the fewer
> node-allocate-and-rewrite operations you'll need to do on your way
> from the updated node to the root. (This probably reduces GC costs as
> well, though I expect that would be a second-order effect.)
> 
> Increasing the branching factor might have diminishing returns after
> some point, but I doubt that a branching factor of 4 is anywhere near
> that point. 2-3 trees, and 2-3-4 trees, have higher (average)
> branching factors than binary trees. Leaning RB trees are binary, but
> there might be something in how they lean that effectively reduces
> the
> average distance from updated node to root. I don't know anything
> about them.
> 
> "P.S. This is the reason why there was hoped to be many different
> dict
> modules with the same API but different data structures. So as it to
> make it easier to choose the most suitable version for each occasion.
> There is no generally "best" choice, even among trees."
> 
> Sign me up for that.
> 
> -michael turner
> 
> On Fri, Jun 8, 2012 at 10:32 AM, Richard O'Keefe <ok@REDACTED>
> wrote:
> > If you take a look at a typical 2-3-4 tree implementation in Java
> >    http://www.cs.berkeley.edu/~jrs/61bs02/hw/hw7/dict/Tree234.java
> >>>  http://www.cs.berkeley.edu/~jrs/61bs02/hw/hw7/dict/Tree234Node.java
> > you'll see nodes containing
> >
> >        an integer saying how many keys there are
> >        three keys
> >        four links to children nodes
> >        a link to a parent node
> >
> > Allow one word for "what is the class of this object" and that's
> > 10 words per node.
> >
> > Using Erlang tuples, we'd have
> >        {K1,C0,C1}                      4 words
> >        {K1,K2,C0,C1,C2}                6 words
> >        {K1,K2,K3,C0,C1,C2,C3}          8 words
> > WAM-based Prolog systems would take the same space;
> > SML/NJ should be able to shave a word off each (using the bottom
> > 2 bits of a pointer to say which of the three cases applies).
> >
> > In a functional language, when you make a change you create new
> > nodes,
> > so no node ever has to change size/shape.  You _could_ do that in
> > Java
> > too, but people often don't think of it.
> >
> > It seems pretty clear that the Prolog/Erlang/SML version of a 2-3-4
> > tree is going to take *less* space than a binary tree in the same
> > language.  Since in these languages there is often a strong
> > correlation
> > between space and time, I find Robert Virding's results pleasant
> > but
> > not that surprising.
> >
> >
> > _______________________________________________
> > erlang-questions mailing list
> > erlang-questions@REDACTED
> > http://erlang.org/mailman/listinfo/erlang-questions
>