[erlang-questions] dict, orddict and gb_tree
Robert Virding
robert.virding@REDACTED
Mon Feb 19 03:04:21 CET 2007
The plot thickens!
It wasn't difficult to generate the random numbers! Erik Stenman sent me
an easy method which works well. I have now converted my db_test module
to work on lists of keys, and added a function which can generate keys
after various rules, including random numbers.
Here are some new test results. As before all times are in millisecs and
are for a whole sequence of keys. For each sequence length I generate 5
random sequences and the results are the average of the runs.
Again these are for random sets of Size integer keys from 1 upto Size:
Size dict rbdict gb_trees
add fet upd add fet upd add/ent fet upd/ent
10000 42 6.5 31 37.5 5.8 36.8 30.9/46 6.3 30.3/43
20000 98 13.8 74 85.8 13.7 86 72.3/104 16 72/102
50000 316 41 303 254 41 260 202/291 47 222/302
100000 854 112 913 573 101 619 478/674 125 549/720
200000 2435 236 2786 1387 237 1430 1090/1527 301 1274/1626
500000 18042 694 28868 4277 736 4163 3451/4659 963 3859/4836
Some comments:
1. They basically show the same trends as before.
2. There is a larger spread in the values for the tree algorithms than
for dict.
3. I can't explain why fetching is faster in rb-trees than in gb-trees.
4. dict still surprises with fast lookup times.
If any one can find some mistake in my code which lead to systematic
errors please tell me. Soon there will have to be more statistical
results as well. :-) Not!
I will soon release rbdict which will be plug-in compatible with idct
and orddict. The module must be cleaned up a little first. Then I can
add a delete column to my table.
Robert
Robert Virding wrote:
> I will try some more scan patterns to see what types of figures I get. I
> don't really envision calculating random permutations of 500k element
> lists. :-)
>
> Richard Carlsson wrote:
>
>>Robert Virding wrote:
>>
>>
>>>3. gb_trees is very slow on adding new values, but rather fast on
>>>fetching and updating them. I don't know why this is so.
>>
>>
>>There were several posts to this list years ago with similar
>>measurements, and I've never seen such strange numbers for add/enter
>>before. In particular, the difference between enter and add (when the
>>key does in fact not exist in the tree) should be precisely the time
>>it takes to do a lookup. Your numbers show a much greater difference
>>between enter and add, so something seems to be fishy here.
>>
>>
>>>4. It is a big win for gb_trees if you know whether you have an
>>>existing key or not when modifying the table.
>>
>>
>>Yes, that is because if the key is known to be present, there will
>>be no need to do any rebalancing (or even check for rebalancing), so
>>the update can be made much faster. Hence, for algorithms that do a
>>lot of updates, it can definitely be better to fully populate the tree
>>in advance, so that update can be used throughout instead of enter.
>>(However, the functional-array implementation that Dan G. and I posted
>>to the list a while back is much faster still, if you have a range of
>>integer keys.)
>>
>>
>>>5. I was a bit surprised that red-black trees could match gb_trees in
>>>fetching as red-black allows some inbalance in the tree. Perhaps the
>>>order of insertion was right for it?
>>
>>
>>Try a random set of keys and let us know.
>>
>> /Richard
>>
>>
>
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://www.erlang.org/mailman/listinfo/erlang-questions
>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: db_test.erl
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20070219/1b437cb7/attachment.ksh>
More information about the erlang-questions
mailing list