Really great! I like to see figures so that I have an idea of what is "a large amount of data".<br><br>Of course, it would be good to have this in trapexit, or maybe even better in the OTP/Erlang documentation itself (thinking to a chapter "Data structures/containers" in section "Programming examples"). I really do think all participations in this mailing list thread are important and I'm not fond of bookmarking mailing list as documentation references :).
<br>Maybe a good time to learn how to create a post in trapexit ;).<br><br>Thanks!<br><br><div><span class="gmail_quote">On 2/14/07, <b class="gmail_sendername">Robert Virding</b> <<a href="mailto:robert.virding@telia.com">
robert.virding@telia.com</a>> wrote:</span><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">Robert Baruch wrote:<br>> On Feb 9, 2007, at 10:11 AM, Richard Carlsson wrote:
<br>><br>>>gb_trees is slightly more lightweight and a little bit faster on most<br>>>operations than dict, so they are particularly good if you have large<br>>>numbers of moderately sized tables. But in general, you're not likely
<br>>>to notice any difference until you really start to push the limits, so<br>>>for everyday programs, you can use either without loss.<br>><br>> I wrote some code a few days ago that had to store about 17,000 key/
<br>> value pairs, where the key was an integer. I found that gb_trees was<br>> far faster in lookup than dict was. I'm not sure if 17,000 is<br>> supposed to be past the limits, but it sure was in my case :)
<br><br>I had to test this so I wrote a small test program to compare the<br>different implementations. It builds dictionaries with integer keys and<br>a single atom as value. Three things are tested: add a new key/value to
<br>a dictionary, fetch all the values, and update all the values. To try<br>and get a little less regular sequence of keys the keys are taken from<br>the middle out of the ranges. So, for example, with 1000 keys the<br>sequence is 500,501,499,502,498,503,...
<br><br>Gb_trees has different functions for adding/updating the trees depending<br>on whether you KNOW if the key already exists, add, enter and update.<br>There are two values for gb_trees for adding a new value and updating an
<br>existing value, they are using add/enter and update/enter.Enter takes<br>longer time as it much first check whether the key is already in the<br>tree. I don't know if this due to the algorithm or the implementation.
<br>If this is a problem depends, of course, on your application.<br><br>For fun I implemented red-black trees directly out of Okasaki to get<br>some more comparison, this is rbdict. It is dict/orddict compatible. Of<br>course.
<br><br>Here are the tables, first one with orddict for the smaller sizes and<br>then one without (it was getting too wide to fit in). The figures are<br>millisec and rough averages over a number of runs. They are good enough
<br>to see the trends.<br><br><br>Size orddict dict rbdict gb_trees<br> add fet upd add fet upd add fet upd add fet upd<br><br> 5000 563 550 1300<br> 10000 2250 2200 4718 32 15 32 31 15 31 78/125 15 31/31
<br> 20000 8500 9200 18890 94 15 70 78 15 70 200/266 16 46/79<br><br>Size dict rbdict gb_trees<br> add fet upd add fet upd add fet upd<br><br>
10000 32 15 32 31 15 31 78/125 15 31/31<br> 20000 94 15 70 78 15 70 200/266 16 46/79<br> 50000 300 40 300 200 40 180 625/830 45 120/200<br>100000 850 100 850 420 94 390 1450/1900 95 266/438
<br>200000 2580 230 2610 890 190 870 3300/5450 190 550/920<br>500000 17400 680 21500 2420 500 2350 11750/19000 500 1440/2520<br><br><br>Some comments about the results:<br><br>1. orddict is nice and quadratic as it should be. It is really only
<br>suitable for small trees.<br>2. Seeing I implemented dict I must say that it produced good results<br>and hung on quite well, until about 50000 elements. This is not due to<br>algorithm itself but rather due to the way in which the internal table
<br>is grown. Should probaby be fixed.<br>3. gb_trees is very slow on adding new values, but rather fast on<br>fetching and updating them. I don't know why this is so.<br>4. It is a big win for gb_trees if you know whether you have an existing
<br>key or not when modifying the table.<br>5. I was a bit surprised that red-black trees could match gb_trees in<br>fetching as red-black allows some inbalance in the tree. Perhaps the<br>order of insertion was right for it?
<br>6. (The obvious) There is no best algorithm but you should test your<br>application to find out which is best for you.<br><br><br>It was to help with the last point that dict/orddict have exactly the<br>same interface. The idea was that you could have a whole range of XXdict
<br>libraries and choose the one which best fit your needs, all you need to<br>do to use another is change the module name, a DICT macro.<br><br>I may just complete rbdict and add it to the libraries. This may be the<br>time to extend the interface with more useful functions.
<br><br>I don't know if the has helped any, or just added to the confusion, but<br>at least we have figures. :-) I include my test module, check for any<br>really bad errors.<br><br>Robert<br><br>P.S. I wonder what trapexit will do to these tables? :-)
<br>_______________________________________________<br>erlang-questions mailing list<br><a href="mailto:erlang-questions@erlang.org">erlang-questions@erlang.org</a><br><a href="http://www.erlang.org/mailman/listinfo/erlang-questions">
http://www.erlang.org/mailman/listinfo/erlang-questions</a><br></blockquote></div><br>