[erlang-questions] dict, orddict and gb_tree

Mon Feb 19 03:04:21 CET 2007

The plot thickens!

It wasn't difficult to generate the random numbers! Erik Stenman sent me 
an easy method which works well. I have now converted my db_test module 
to work on lists of keys, and added a function which can generate keys 
after various rules, including random numbers.

Here are some new test results. As before all times are in millisecs and 
are for a whole sequence of keys. For each sequence length I generate 5 
random sequences and the results are the average of the runs.

Again these are for random sets of Size integer keys from 1 upto Size:

Size          dict             rbdict               gb_trees
           add   fet   upd    add  fet  upd    add/ent   fet   upd/ent

  10000     42    6.5   31   37.5  5.8  36.8  30.9/46     6.3 30.3/43
  20000     98   13.8   74   85.8 13.7  86    72.3/104    16    72/102
  50000    316   41    303   254  41   260     202/291    47   222/302
100000    854  112    913   573 101   619     478/674   125   549/720
200000   2435  236   2786  1387 237  1430    1090/1527  301  1274/1626
500000  18042  694  28868  4277 736  4163    3451/4659  963  3859/4836

Some comments:

1. They basically show the same trends as before.

2. There is a larger spread in the values for the tree algorithms than 
for dict.

3. I can't explain why fetching is faster in rb-trees than in gb-trees.

4. dict still surprises with fast lookup times.

If any one can find some mistake in my code which lead to systematic 
errors please tell me. Soon there will have to be more statistical 
results as well. :-) Not!

I will soon release rbdict which will be plug-in compatible with idct 
and orddict. The module must be cleaned up a little first. Then I can 
add a delete column to my table.

Robert

Robert Virding wrote:
> I will try some more scan patterns to see what types of figures I get. I 
> don't really envision calculating random permutations of 500k element 
> lists. :-)
> 
> Richard Carlsson wrote:
> 
>>Robert Virding wrote:
>>
>>
>>>3. gb_trees is very slow on adding new values, but rather fast on 
>>>fetching and updating them. I don't know why this is so.
>>
>>
>>There were several posts to this list years ago with similar
>>measurements, and I've never seen such strange numbers for add/enter
>>before. In particular, the difference between enter and add (when the
>>key does in fact not exist in the tree) should be precisely the time
>>it takes to do a lookup. Your numbers show a much greater difference
>>between enter and add, so something seems to be fishy here.
>>
>>
>>>4. It is a big win for gb_trees if you know whether you have an 
>>>existing key or not when modifying the table.
>>
>>
>>Yes, that is because if the key is known to be present, there will
>>be no need to do any rebalancing (or even check for rebalancing), so
>>the update can be made much faster. Hence, for algorithms that do a
>>lot of updates, it can definitely be better to fully populate the tree
>>in advance, so that update can be used throughout instead of enter.
>>(However, the functional-array implementation that Dan G. and I posted
>>to the list a while back is much faster still, if you have a range of
>>integer keys.)
>>
>>
>>>5. I was a bit surprised that red-black trees could match gb_trees in 
>>>fetching as red-black allows some inbalance in the tree. Perhaps the 
>>>order of insertion was right for it?
>>
>>
>>Try a random set of keys and let us know.
>>
>>    /Richard
>>
>>
> 
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://www.erlang.org/mailman/listinfo/erlang-questions
> 
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: db_test.erl
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20070219/1b437cb7/attachment.ksh>