[erlang-questions] Data structures are strange things
Robert Virding
rvirding@REDACTED
Tue Jul 15 22:30:55 CEST 2014
No, I never published them. I still have them of course. The one I prefer
is the 2-3 trees which I use in my luerl implementation. Maps could replace
them but they are lacking some functions I need. It will be interesting to
see how they compare in speed.
Robert
On 14 July 2014 08:40, Kirill Zaborsky <qrilka@REDACTED> wrote:
> Hi Robert,
> I have came across this a bit old topic and I wonder - did you publish any
> of those modules?
> I have only found your repo https://github.com/rvirding/rb which is even
> 3 years older than this thread :)
>
> Regards,
> Kirill
>
> пятница, 8 июня 2012 г., 4:55:29 UTC+4 пользователь Robert Virding написал:
>
>> I was finally going to release some alternate modules with the dict
>> interface using tree structures, either AA trees or Red Black trees. These
>> are relatively well-known binary tree structures which are binary tree
>> versions of 2-3 trees and 2-3-4 trees respectively. These contain codes
>> with either 2, 3 or 4 sub-nodes. Generally the binary versions are
>> recommended as being better/easier/more space efficient.
>>
>> Before I released my modules I thought it would be interesting to try the
>> non-binary versions to see how they behaved and what it was lie to program
>> them. I was lucky to find a very good description of 2-3 trees and the
>> rules for inserting/deleting elements; it was literally just translate the
>> rules straight to Erlang and it worked. The code was a little more complex
>> than working with the binary versions, not much. In some cases actually
>> simpler. The I decided to measure speed, and the fun started.
>>
>> All tests were done with sequences of random numbers as keys, the same
>> for each implementation.
>>
>> What I found was that the lookup times were about the same as for the
>> binary versions, which was expected, but the insertion/deletion times were
>> about 20% faster. This compared to my AA and RB tree versions. To check
>> that it wasn't just bad coding on my part (not bloody likely :-)) I
>> compared it with gb_trees and got the similar results. Also checking the
>> data sizes I found that the 2-3 and 2-3-4 trees actually used *less* space
>> than their binary counterparts.
>>
>> Here are the figures. I have also included dict, which is the slowest on
>> insertion/deletion but significantly faster lookup times. All times are in
>> usecs for the whole set:
>>
>> Operation Load Fetch Load Fetch
>>
>> # of elements 100000 10000
>> Overhead time 43000 3700
>>
>> ttdict 2-3 trees 310000 161000 24000 15500
>> ttfdict 2-3-4 trees 312000 173000 24200 16500
>> llrbdict Left Leaning RB trees 390000 164000 27100 14900
>>
>> rbdict Red Black trees 380000 167000 27100 15000
>> aadict AA trees 423000 167000 30000 15000
>>
>> gb_trees using faster insert/3 370000 190000 26500 16200
>> dict 522000 107000 28000 11000
>>
>>
>> (I hope the table is readable)
>> Gb_trees are a variant of AA trees with a slightly different algorithm.
>> One should be careful interpreting exact values here but the trend is clear.
>>
>> I don't know why the straight multi-size versions are faster but my guess
>> is that the pattern-matching/data construction is faster on the flat bigger
>> nodes rather than on the nested pairs. My guess is that when they comment
>> on the 2-3(-4) trees using more space it is because they use the same size
>> node for all cases which in the cases with fewer sub-nodes wastes space,
>> while just used different sized tuples which didn't waste space; it also
>> made the pattern matching much easier.
>>
>> It looks like I will be submitting the 2-3 tree version, ttdict, as
>> reasonable choice. Next I will try AVL trees which were discarded long ago
>> and see how they fare.
>>
>> Robert
>>
>> P.S. This is the reason why there was hoped to be many different dict
>> modules with the same API but different data structures. So as it to make
>> it easier to choose the most suitable version for each occasion. There is
>> no generally "best" choice, even among trees.
>>
>>
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20140715/b88e177d/attachment.htm>
More information about the erlang-questions
mailing list