[erlang-questions] Data structures are strange things

Sat Jun 9 15:48:19 CEST 2012

On 06/09/2012 01:51 AM, Robert Virding wrote:
> No, it is not surprising when you think about it. I just found it a bit surprising that the conventions which seem to apply to imperative languages have been more less imported into the functional/declarative world where they don't apply to the same degree. It seems that in the functional world going "red-black" seems to be accepted even though there aren't many benefits of it. The code is a little simpler, but for 2-3 trees the difference is very small. And it is faster for insertion and deletion.
>
> ----- Original Message -----
>> If you take a look at a typical 2-3-4 tree implementation in Java
>>      http://www.cs.berkeley.edu/~jrs/61bs02/hw/hw7/dict/Tree234.java
>>>>   http://www.cs.berkeley.edu/~jrs/61bs02/hw/hw7/dict/Tree234Node.java
>> you'll see nodes containing
>>
>> 	an integer saying how many keys there are
>> 	three keys
>> 	four links to children nodes
>> 	a link to a parent node
>>
>> Allow one word for "what is the class of this object" and that's
>> 10 words per node.
>>
>> Using Erlang tuples, we'd have
>> 	{K1,C0,C1}			4 words
>> 	{K1,K2,C0,C1,C2}		6 words
>> 	{K1,K2,K3,C0,C1,C2,C3}		8 words

One advantage the General Balanced trees have compared to RB-trees and 
AVL trees is that they don't need any additional information per node 
(like a colour or size). Each GB-tree node is {Key, Value, Smaller, 
Bigger} (5 words), and adding an extra field would mean 6 words, a 20% 
increase both in memory usage and in words written per node update. The 
code is also very straightforward, since there are no special cases 
anywhere, only a check for rebalancing.

It makes sense that 2-3-(4) trees could perform even better, since they 
reduce the height of the tree and can effectively use the size of the 
Erlang tuple as the additional balancing information as Robert and ROK 
noted. 2-3 trees are just the simplest case of B-trees, so all leaves 
are at the same depth. (It's balancing on deletion that should be the 
tricky part.)

Our experiences from implementing the array module indicate that the 
optimal node size for a tree in Erlang is around 8-10 elements, so the 
most efficient tree implementation would probably be something like a 
2-3-...-8 tree with everything unrolled. :-) (You could of course 
generate the code from a spec.)

    /Richard