[erlang-questions] dict, orddict and gb_tree
Ludovic Coquelle
lcoquelle@REDACTED
Wed Feb 14 02:28:19 CET 2007
Really great! I like to see figures so that I have an idea of what is "a
large amount of data".
Of course, it would be good to have this in trapexit, or maybe even better
in the OTP/Erlang documentation itself (thinking to a chapter "Data
structures/containers" in section "Programming examples"). I really do think
all participations in this mailing list thread are important and I'm not
fond of bookmarking mailing list as documentation references :).
Maybe a good time to learn how to create a post in trapexit ;).
Thanks!
On 2/14/07, Robert Virding <robert.virding@REDACTED> wrote:
>
> Robert Baruch wrote:
> > On Feb 9, 2007, at 10:11 AM, Richard Carlsson wrote:
> >
> >>gb_trees is slightly more lightweight and a little bit faster on most
> >>operations than dict, so they are particularly good if you have large
> >>numbers of moderately sized tables. But in general, you're not likely
> >>to notice any difference until you really start to push the limits, so
> >>for everyday programs, you can use either without loss.
> >
> > I wrote some code a few days ago that had to store about 17,000 key/
> > value pairs, where the key was an integer. I found that gb_trees was
> > far faster in lookup than dict was. I'm not sure if 17,000 is
> > supposed to be past the limits, but it sure was in my case :)
>
> I had to test this so I wrote a small test program to compare the
> different implementations. It builds dictionaries with integer keys and
> a single atom as value. Three things are tested: add a new key/value to
> a dictionary, fetch all the values, and update all the values. To try
> and get a little less regular sequence of keys the keys are taken from
> the middle out of the ranges. So, for example, with 1000 keys the
> sequence is 500,501,499,502,498,503,...
>
> Gb_trees has different functions for adding/updating the trees depending
> on whether you KNOW if the key already exists, add, enter and update.
> There are two values for gb_trees for adding a new value and updating an
> existing value, they are using add/enter and update/enter.Enter takes
> longer time as it much first check whether the key is already in the
> tree. I don't know if this due to the algorithm or the implementation.
> If this is a problem depends, of course, on your application.
>
> For fun I implemented red-black trees directly out of Okasaki to get
> some more comparison, this is rbdict. It is dict/orddict compatible. Of
> course.
>
> Here are the tables, first one with orddict for the smaller sizes and
> then one without (it was getting too wide to fit in). The figures are
> millisec and rough averages over a number of runs. They are good enough
> to see the trends.
>
>
> Size orddict dict rbdict gb_trees
> add fet upd add fet upd add fet upd add fet upd
>
> 5000 563 550 1300
> 10000 2250 2200 4718 32 15 32 31 15 31 78/125 15 31/31
> 20000 8500 9200 18890 94 15 70 78 15 70 200/266 16 46/79
>
> Size dict rbdict gb_trees
> add fet upd add fet upd add fet upd
>
> 10000 32 15 32 31 15 31 78/125 15 31/31
> 20000 94 15 70 78 15 70 200/266 16 46/79
> 50000 300 40 300 200 40 180 625/830 45 120/200
> 100000 850 100 850 420 94 390 1450/1900 95 266/438
> 200000 2580 230 2610 890 190 870 3300/5450 190 550/920
> 500000 17400 680 21500 2420 500 2350 11750/19000 500 1440/2520
>
>
> Some comments about the results:
>
> 1. orddict is nice and quadratic as it should be. It is really only
> suitable for small trees.
> 2. Seeing I implemented dict I must say that it produced good results
> and hung on quite well, until about 50000 elements. This is not due to
> algorithm itself but rather due to the way in which the internal table
> is grown. Should probaby be fixed.
> 3. gb_trees is very slow on adding new values, but rather fast on
> fetching and updating them. I don't know why this is so.
> 4. It is a big win for gb_trees if you know whether you have an existing
> key or not when modifying the table.
> 5. I was a bit surprised that red-black trees could match gb_trees in
> fetching as red-black allows some inbalance in the tree. Perhaps the
> order of insertion was right for it?
> 6. (The obvious) There is no best algorithm but you should test your
> application to find out which is best for you.
>
>
> It was to help with the last point that dict/orddict have exactly the
> same interface. The idea was that you could have a whole range of XXdict
> libraries and choose the one which best fit your needs, all you need to
> do to use another is change the module name, a DICT macro.
>
> I may just complete rbdict and add it to the libraries. This may be the
> time to extend the interface with more useful functions.
>
> I don't know if the has helped any, or just added to the confusion, but
> at least we have figures. :-) I include my test module, check for any
> really bad errors.
>
> Robert
>
> P.S. I wonder what trapexit will do to these tables? :-)
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://www.erlang.org/mailman/listinfo/erlang-questions
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20070214/1f227c85/attachment.htm>
More information about the erlang-questions
mailing list