[erlang-questions] Data structures are strange things

Robert Virding rvirding@REDACTED
Tue Jul 15 22:30:55 CEST 2014


No, I never published them. I still have them of course. The one I prefer
is the 2-3 trees which I use in my luerl implementation. Maps could replace
them but they are lacking some functions I need. It will be interesting to
see how they compare in speed.

Robert



On 14 July 2014 08:40, Kirill Zaborsky <qrilka@REDACTED> wrote:

> Hi Robert,
> I have came across this a bit old topic and I wonder - did you publish any
> of those modules?
> I have only found your repo https://github.com/rvirding/rb which is even
> 3 years older than this thread :)
>
> Regards,
> Kirill
>
> пятница, 8 июня 2012 г., 4:55:29 UTC+4 пользователь Robert Virding написал:
>
>> I was finally going to release some alternate modules with the dict
>> interface using tree structures, either AA trees or Red Black trees. These
>> are relatively well-known binary tree structures which are binary tree
>> versions of 2-3 trees and 2-3-4 trees respectively. These contain codes
>> with either 2, 3 or 4 sub-nodes. Generally the binary versions are
>> recommended as being better/easier/more space efficient.
>>
>> Before I released my modules I thought it would be interesting to try the
>> non-binary versions to see how they behaved and what it was lie to program
>> them. I was lucky to find a very good description of 2-3 trees and the
>> rules for inserting/deleting elements; it was literally just translate the
>> rules straight to Erlang and it worked. The code was a little more complex
>> than working with the binary versions, not much. In some cases actually
>> simpler. The I decided to measure speed, and the fun started.
>>
>> All tests were done with sequences of random numbers as keys, the same
>> for each implementation.
>>
>> What I found was that the lookup times were about the same as for the
>> binary versions, which was expected, but the insertion/deletion times were
>> about 20% faster. This compared to my AA and RB tree versions. To check
>> that it wasn't just bad coding on my part (not bloody likely :-)) I
>> compared it with gb_trees and got the similar results. Also checking the
>> data sizes I found that the 2-3 and 2-3-4 trees actually used *less* space
>> than their binary counterparts.
>>
>> Here are the figures. I have also included dict, which is the slowest on
>> insertion/deletion but significantly faster lookup times. All times are in
>> usecs for the whole set:
>>
>> Operation                        Load        Fetch       Load        Fetch
>>
>> # of elements                         100000                  10000
>> Overhead time                          43000                   3700
>>
>> ttdict 2-3 trees    310000      161000      24000       15500
>> ttfdict 2-3-4 trees   312000      173000      24200       16500
>> llrbdict   Left Leaning RB trees 390000      164000      27100       14900
>>
>> rbdict     Red Black trees 380000      167000      27100       15000
>> aadict     AA trees 423000      167000      30000       15000
>>
>> gb_trees   using faster insert/3 370000      190000      26500       16200
>> dict       522000      107000      28000       11000
>>
>>
>> (I hope the table is readable)
>> Gb_trees are a variant of AA trees with a slightly different algorithm.
>> One should be careful interpreting exact values here but the trend is clear.
>>
>> I don't know why the straight multi-size versions are faster but my guess
>> is that the pattern-matching/data construction is faster on the flat bigger
>> nodes rather than on the nested pairs. My guess is that when they comment
>> on the 2-3(-4) trees using more space it is because they use the same size
>> node for all cases which in the cases with fewer sub-nodes wastes space,
>> while just used different sized tuples which didn't waste space; it also
>> made the pattern matching much easier.
>>
>> It looks like I will be submitting the 2-3 tree version, ttdict, as
>> reasonable choice. Next I will try AVL trees which were discarded long ago
>> and see how they fare.
>>
>> Robert
>>
>> P.S. This is the reason why there was hoped to be many different dict
>> modules with the same API but different data structures. So as it to make
>> it easier to choose the most suitable version for each occasion. There is
>> no generally "best" choice, even among trees.
>>
>>
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20140715/b88e177d/attachment.htm>


More information about the erlang-questions mailing list