[erlang-questions] Data structures are strange things

Kirill Zaborsky qrilka@REDACTED
Mon Jul 14 08:40:56 CEST 2014


Hi Robert,
I have came across this a bit old topic and I wonder - did you publish any 
of those modules?
I have only found your repo https://github.com/rvirding/rb which is even 3 
years older than this thread :)

Regards,
Kirill

пятница, 8 июня 2012 г., 4:55:29 UTC+4 пользователь Robert Virding написал:
>
> I was finally going to release some alternate modules with the dict 
> interface using tree structures, either AA trees or Red Black trees. These 
> are relatively well-known binary tree structures which are binary tree 
> versions of 2-3 trees and 2-3-4 trees respectively. These contain codes 
> with either 2, 3 or 4 sub-nodes. Generally the binary versions are 
> recommended as being better/easier/more space efficient.
>
> Before I released my modules I thought it would be interesting to try the 
> non-binary versions to see how they behaved and what it was lie to program 
> them. I was lucky to find a very good description of 2-3 trees and the 
> rules for inserting/deleting elements; it was literally just translate the 
> rules straight to Erlang and it worked. The code was a little more complex 
> than working with the binary versions, not much. In some cases actually 
> simpler. The I decided to measure speed, and the fun started.
>
> All tests were done with sequences of random numbers as keys, the same for 
> each implementation.
>
> What I found was that the lookup times were about the same as for the 
> binary versions, which was expected, but the insertion/deletion times were 
> about 20% faster. This compared to my AA and RB tree versions. To check 
> that it wasn't just bad coding on my part (not bloody likely :-)) I 
> compared it with gb_trees and got the similar results. Also checking the 
> data sizes I found that the 2-3 and 2-3-4 trees actually used *less* space 
> than their binary counterparts.
>
> Here are the figures. I have also included dict, which is the slowest on 
> insertion/deletion but significantly faster lookup times. All times are in 
> usecs for the whole set:
>
> Operation                        Load        Fetch       Load        Fetch
> # of elements                         100000                  10000
> Overhead time                          43000                   3700
>
> ttdict 2-3 trees    310000      161000      24000       15500
> ttfdict 2-3-4 trees   312000      173000      24200       16500
> llrbdict   Left Leaning RB trees 390000      164000      27100       14900
>
> rbdict     Red Black trees 380000      167000      27100       15000
> aadict     AA trees 423000      167000      30000       15000
>
> gb_trees   using faster insert/3 370000      190000      26500       16200
> dict       522000      107000      28000       11000
>
>
> (I hope the table is readable)
> Gb_trees are a variant of AA trees with a slightly different algorithm. 
> One should be careful interpreting exact values here but the trend is clear.
>
> I don't know why the straight multi-size versions are faster but my guess 
> is that the pattern-matching/data construction is faster on the flat bigger 
> nodes rather than on the nested pairs. My guess is that when they comment 
> on the 2-3(-4) trees using more space it is because they use the same size 
> node for all cases which in the cases with fewer sub-nodes wastes space, 
> while just used different sized tuples which didn't waste space; it also 
> made the pattern matching much easier.
>
> It looks like I will be submitting the 2-3 tree version, ttdict, as 
> reasonable choice. Next I will try AVL trees which were discarded long ago 
> and see how they fare.
>
> Robert
>
> P.S. This is the reason why there was hoped to be many different dict 
> modules with the same API but different data structures. So as it to make 
> it easier to choose the most suitable version for each occasion. There is 
> no generally "best" choice, even among trees.
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20140713/3fe54a2d/attachment.htm>


More information about the erlang-questions mailing list