[erlang-questions] Data structures are strange things
Kirill Zaborsky
qrilka@REDACTED
Mon Jul 14 08:40:56 CEST 2014
Hi Robert,
I have came across this a bit old topic and I wonder - did you publish any
of those modules?
I have only found your repo https://github.com/rvirding/rb which is even 3
years older than this thread :)
Regards,
Kirill
пятница, 8 июня 2012 г., 4:55:29 UTC+4 пользователь Robert Virding написал:
>
> I was finally going to release some alternate modules with the dict
> interface using tree structures, either AA trees or Red Black trees. These
> are relatively well-known binary tree structures which are binary tree
> versions of 2-3 trees and 2-3-4 trees respectively. These contain codes
> with either 2, 3 or 4 sub-nodes. Generally the binary versions are
> recommended as being better/easier/more space efficient.
>
> Before I released my modules I thought it would be interesting to try the
> non-binary versions to see how they behaved and what it was lie to program
> them. I was lucky to find a very good description of 2-3 trees and the
> rules for inserting/deleting elements; it was literally just translate the
> rules straight to Erlang and it worked. The code was a little more complex
> than working with the binary versions, not much. In some cases actually
> simpler. The I decided to measure speed, and the fun started.
>
> All tests were done with sequences of random numbers as keys, the same for
> each implementation.
>
> What I found was that the lookup times were about the same as for the
> binary versions, which was expected, but the insertion/deletion times were
> about 20% faster. This compared to my AA and RB tree versions. To check
> that it wasn't just bad coding on my part (not bloody likely :-)) I
> compared it with gb_trees and got the similar results. Also checking the
> data sizes I found that the 2-3 and 2-3-4 trees actually used *less* space
> than their binary counterparts.
>
> Here are the figures. I have also included dict, which is the slowest on
> insertion/deletion but significantly faster lookup times. All times are in
> usecs for the whole set:
>
> Operation Load Fetch Load Fetch
> # of elements 100000 10000
> Overhead time 43000 3700
>
> ttdict 2-3 trees 310000 161000 24000 15500
> ttfdict 2-3-4 trees 312000 173000 24200 16500
> llrbdict Left Leaning RB trees 390000 164000 27100 14900
>
> rbdict Red Black trees 380000 167000 27100 15000
> aadict AA trees 423000 167000 30000 15000
>
> gb_trees using faster insert/3 370000 190000 26500 16200
> dict 522000 107000 28000 11000
>
>
> (I hope the table is readable)
> Gb_trees are a variant of AA trees with a slightly different algorithm.
> One should be careful interpreting exact values here but the trend is clear.
>
> I don't know why the straight multi-size versions are faster but my guess
> is that the pattern-matching/data construction is faster on the flat bigger
> nodes rather than on the nested pairs. My guess is that when they comment
> on the 2-3(-4) trees using more space it is because they use the same size
> node for all cases which in the cases with fewer sub-nodes wastes space,
> while just used different sized tuples which didn't waste space; it also
> made the pattern matching much easier.
>
> It looks like I will be submitting the 2-3 tree version, ttdict, as
> reasonable choice. Next I will try AVL trees which were discarded long ago
> and see how they fare.
>
> Robert
>
> P.S. This is the reason why there was hoped to be many different dict
> modules with the same API but different data structures. So as it to make
> it easier to choose the most suitable version for each occasion. There is
> no generally "best" choice, even among trees.
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20140713/3fe54a2d/attachment.htm>
More information about the erlang-questions
mailing list