[erlang-questions] Data structures are strange things

Sat Jun 9 16:03:56 CEST 2012

I suggest *artificially* increasing the node sizes, and seeing how
well the results hold up. I suspect the differences in performance are
strongly dominated by the differences between
distance-from-update-point-to-root, with differences in memory
consumption being a second-order effect.

-michael turner

On Sat, Jun 9, 2012 at 10:48 PM, Richard Carlsson
<carlsson.richard@REDACTED> wrote:
> On 06/09/2012 01:51 AM, Robert Virding wrote:
>>
>> No, it is not surprising when you think about it. I just found it a bit
>> surprising that the conventions which seem to apply to imperative languages
>> have been more less imported into the functional/declarative world where
>> they don't apply to the same degree. It seems that in the functional world
>> going "red-black" seems to be accepted even though there aren't many
>> benefits of it. The code is a little simpler, but for 2-3 trees the
>> difference is very small. And it is faster for insertion and deletion.
>>
>> ----- Original Message -----
>>>
>>> If you take a look at a typical 2-3-4 tree implementation in Java
>>>     http://www.cs.berkeley.edu/~jrs/61bs02/hw/hw7/dict/Tree234.java
>>>>>
>>>>>  http://www.cs.berkeley.edu/~jrs/61bs02/hw/hw7/dict/Tree234Node.java
>>>
>>> you'll see nodes containing
>>>
>>>        an integer saying how many keys there are
>>>        three keys
>>>        four links to children nodes
>>>        a link to a parent node
>>>
>>> Allow one word for "what is the class of this object" and that's
>>> 10 words per node.
>>>
>>> Using Erlang tuples, we'd have
>>>        {K1,C0,C1}                      4 words
>>>        {K1,K2,C0,C1,C2}                6 words
>>>        {K1,K2,K3,C0,C1,C2,C3}          8 words
>
>
> One advantage the General Balanced trees have compared to RB-trees and AVL
> trees is that they don't need any additional information per node (like a
> colour or size). Each GB-tree node is {Key, Value, Smaller, Bigger} (5
> words), and adding an extra field would mean 6 words, a 20% increase both in
> memory usage and in words written per node update. The code is also very
> straightforward, since there are no special cases anywhere, only a check for
> rebalancing.
>
> It makes sense that 2-3-(4) trees could perform even better, since they
> reduce the height of the tree and can effectively use the size of the Erlang
> tuple as the additional balancing information as Robert and ROK noted. 2-3
> trees are just the simplest case of B-trees, so all leaves are at the same
> depth. (It's balancing on deletion that should be the tricky part.)
>
> Our experiences from implementing the array module indicate that the optimal
> node size for a tree in Erlang is around 8-10 elements, so the most
> efficient tree implementation would probably be something like a 2-3-...-8
> tree with everything unrolled. :-) (You could of course generate the code
> from a spec.)
>
>   /Richard
>
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions