[erlang-questions] Data structures are strange things
Richard Carlsson
carlsson.richard@REDACTED
Sat Jun 9 15:48:19 CEST 2012
On 06/09/2012 01:51 AM, Robert Virding wrote:
> No, it is not surprising when you think about it. I just found it a bit surprising that the conventions which seem to apply to imperative languages have been more less imported into the functional/declarative world where they don't apply to the same degree. It seems that in the functional world going "red-black" seems to be accepted even though there aren't many benefits of it. The code is a little simpler, but for 2-3 trees the difference is very small. And it is faster for insertion and deletion.
>
> ----- Original Message -----
>> If you take a look at a typical 2-3-4 tree implementation in Java
>> http://www.cs.berkeley.edu/~jrs/61bs02/hw/hw7/dict/Tree234.java
>>>> http://www.cs.berkeley.edu/~jrs/61bs02/hw/hw7/dict/Tree234Node.java
>> you'll see nodes containing
>>
>> an integer saying how many keys there are
>> three keys
>> four links to children nodes
>> a link to a parent node
>>
>> Allow one word for "what is the class of this object" and that's
>> 10 words per node.
>>
>> Using Erlang tuples, we'd have
>> {K1,C0,C1} 4 words
>> {K1,K2,C0,C1,C2} 6 words
>> {K1,K2,K3,C0,C1,C2,C3} 8 words
One advantage the General Balanced trees have compared to RB-trees and
AVL trees is that they don't need any additional information per node
(like a colour or size). Each GB-tree node is {Key, Value, Smaller,
Bigger} (5 words), and adding an extra field would mean 6 words, a 20%
increase both in memory usage and in words written per node update. The
code is also very straightforward, since there are no special cases
anywhere, only a check for rebalancing.
It makes sense that 2-3-(4) trees could perform even better, since they
reduce the height of the tree and can effectively use the size of the
Erlang tuple as the additional balancing information as Robert and ROK
noted. 2-3 trees are just the simplest case of B-trees, so all leaves
are at the same depth. (It's balancing on deletion that should be the
tricky part.)
Our experiences from implementing the array module indicate that the
optimal node size for a tree in Erlang is around 8-10 elements, so the
most efficient tree implementation would probably be something like a
2-3-...-8 tree with everything unrolled. :-) (You could of course
generate the code from a spec.)
/Richard
More information about the erlang-questions
mailing list