[erlang-questions] Wanted additions to the maps module?
Raimo Niskanen
raimo+erlang-questions@REDACTED
Thu May 12 12:12:42 CEST 2016
On Wed, May 11, 2016 at 04:50:43PM +0000, Grzegorz Junka wrote:
>
> On 11/05/2016 15:40, Raimo Niskanen wrote:
> > On Wed, May 11, 2016 at 09:48:41AM +0000, Grzegorz Junka wrote:
> >> On 11/05/2016 08:55, Raimo Niskanen wrote:
> >>> On Tue, May 10, 2016 at 02:30:19PM +0000, Grzegorz Junka wrote:
> >>>> In my test I am adding the memory consumed by the process and in ETS:
> >>>> process_size(E) + ets:info(Tid, memory) * erlang:system_info(wordsize).
> >>>> It's hard but one need to start somewhere. It's not enough to say, it's
> >>>> hard, don't do it.
> >>> I did not say "don't do it". I said it is not a fair comparision,
> >>> with the underlying assumption that you only compared heap usage.
> >>>
> >>> But since you apparently have tried to compensate for that unfairness
> >>> you just might have a fairly fair comparision.
> >>>
> >>> As Björn-Egil pointed out, though, process_info(self(), memory) might not
> >>> be a good metric without a garbage_collect() preceeding it, depending on
> >>> the application you benchmark for.
> >>>
> >>> You do not know for certain that the final process size is the maximum
> >>> during the test, depending on when garbage collections are triggered. Have
> >>> a look at erlang:system_monitor(Pid, [{large_heap,Size}]). It might be
> >>> relevant in this benchmark.
> >> I am not that interested in the size of the process containing the data
> >> structure after it has been garbage collected. I want to know how much
> >> garbage the data structure is generating because this directly affects
> >> the size of the VM when many of such processes are running in parallel.
> >> I don't know of a fairer way of measuring that than by getting the size
> >> of the process straight after adding and reading the specified amount of
> >> keys.
> > Then I say that a fairer way to know how much garbage a process creates is
> > to use erlang:trace(WorkerProcess, true,
> > [garbage_collection,{tracer,TracerProcess}]).
> >
> > The messages from that feature can tell you how much garbage is being
> > reclaimed.
>
> I probably don't explain myself clearly. I am not interested in how much
> garbage is reclaimed either. I am interested in the effect of using one
> data structure or the other on the size of the VM. In other words, how
> much in average will my VM grow when I create a process that stores
> 30,000 key value pairs. This is so that I can estimate which data
> structure will let me store the biggest amount of key-value pairs given
> a specific amount of physical memory and a reasonable performance of
> looking up existing and storing new pairs. Imagine a huge data store all
> kept in memory, similar to mnesia ram disk table.
I got that!
I try to explain that the size of the process after the test is not a valid
measurement point! Since heaps can shrink the process may have been larger
during the test.
>
>
> >> Please note that there is no deleting of key-value pairs, only adding
> >> and reading. So any garbage generated is caused by changes to the
> >> internal representation of the table holding the growing amount of
> >> elements. The allocator can't reuse memory chunks released by elements
> >> deleted previously. This may not be applicable to all real-life
> > It can on the contrary reuse such memory chunks.
> >
> > Tuple example:
> > * We start with a tuple of 1000 elements, and a list of 30000 elements to
> > insert in the tuple. Each element uses 3 words plus 2 in the list cons
> > cell that is 5 * 30000 words plus the containing tuple of 1001 words
> > that is 151001 words. The nearest above in the Fibonacci series that
> > the heap algorithm uses is 189886 words.
> > * We then add one element to the tuple. We only need to allocate 1002 words
> > for the new containing tuple, and we have 189886 - 151001 = 38885 so that
> > is no problem.
> > * After creating the new containing tuple the old one is garbage not yet
> > noticed. We continue to add elements and therefore allocate new
> > containing tuples of 1003, 1004, ... words, so at about thirty added
> > elements we run out of heap and allocate a new heap of 307242 (Fibonacci,
> > again) words.
> > * After creating the containing tuple that caused the new heap allocation
> > we garbage collect and then find out that all but the last tuple
> > was needed so we have allocated 307242 but only use 5 * 30000
> > + 1030 = 151030 words. So now we can go on with around (307242 - 151030)
> > / 1030 ~= 150 element insertions before the next heap allocation where we
> > again will allocate the next size 497128 and only use about 5 * 30000 +
> > 1180 = 151180 words. That is 497128 / 151180 ~= 3.3 times more than
> > needed.
> > * If the garbage collect causes the new heap to be less than one fourth
> > of the old heap size the new heap may be _shrunk_.
> >
> > So depending on if you happen to stop your iteration just before or just
> > after a garbage collection, maybe a shrinking one, you will get totally
> > different heap sizes.
> >
> > To furthermore complicate the situation much of your data for example the
> > list you start with will survive two garbage collections and therefore be
> > stored on the old-heap. This also applies to the parts of every tree data
> > structure you benchmark that does not change often. This throws all
> > numbers in my tuple example above out the window.
> >
> > Try increasing Accesses to 300000, and to make the test complete in a
> > reasonable amount of time remove list and orddict from Types, and you will
> > get completely different results of your benchmark. We get for example
> > _negative_ results for dict and map looks like a winner. This kind of
> > wobbling result depending on the scale of the test is a reliable sign of an
> > unreliable benchmark...
>
> I use the same input data in the same order in all tests in all cycles
> during a single run. I believe there is no randomness in storing the
> same keys in the same data structures, so even if the reallocation
> occurs as you mentioned, it should be deterministically reproducible in
> every cycle. This indeed can be seen in the results, where in three
The randomness is not between different runs of the same data structure; it
is between different data structures so you can not compare your result
from dict with the result from gb_trees.
This is because a heap may shrink at some garbage collection and exactly when
that happens depends on how much garbage has been produced, and how much
garbage a certain data structure produces is very hard to predict (pseudorandom).
> cycles the reported sizes are the same every time. On the next run some
> results will be different, but again the same in all three cycles in
> that single run.
>
> It's interesting that no matter how many times I run the test some of
> the processes end up with just the same amount of memory allocated to
> it. Even if the input data changes, most of the data structures still
> allocate the same amount of memory, e.g.:
The heap sizes are selected from the a Fibonacci-like series.
That creates a quantification that is seen here.
>
> 15335 Hits: ets Time:22068 Size:903056
> 15335 Hits: pd Time:12296 Size:2709360
> 15335 Hits: dict Time:67726 Size:2174112
> 15335 Hits: gb_trees Time:77140 Size:2174112
> 15335 Hits: map_get Time:22862 Size:5091000
> 15335 Hits: map_find Time:23983 Size:5091000
> 15335 Hits: map_match Time:20434 Size:5091000
> 15335 Hits: list Time:19606771 Size:371376
> 15335 Hits: orddict Time:8952194 Size:371376
>
> 15181 Hits: ets Time:21946 Size:900584
> 15181 Hits: pd Time:12655 Size:2709360
> 15181 Hits: dict Time:70246 Size:5091000
> 15181 Hits: gb_trees Time:83523 Size:2174112
> 15181 Hits: map_get Time:26856 Size:5091000
> 15181 Hits: map_find Time:21480 Size:5091000
> 15181 Hits: map_match Time:20160 Size:5091000
> 15181 Hits: list Time:19477574 Size:600904
> 15181 Hits: orddict Time:9250092 Size:600904
>
> 15254 Hits: ets Time:22745 Size:903552
> 15254 Hits: pd Time:12688 Size:2709360
> 15254 Hits: dict Time:70869 Size:2174112
> 15254 Hits: gb_trees Time:83563 Size:2174112
> 15254 Hits: map_get Time:23372 Size:5091000
> 15254 Hits: map_find Time:24381 Size:5091000
> 15254 Hits: map_match Time:20008 Size:5091000
> 15254 Hits: list Time:19669187 Size:600904
> 15254 Hits: orddict Time:9139247 Size:600904
>
> Those are different runs, which can be seen by the Hits count. And yet
> maps and gb_trees take the same amount of memory every time.
>
> Running with 300,000 elements doesn't change much:
>
> 281040 Hits: ets Time:217597 Size:9245448
> 281040 Hits: pd Time:119464 Size:8426240
> 281040 Hits: dict Time:700924 Size:21150800
> 281040 Hits: gb_trees Time:759213 Size:21150800
> 281040 Hits: map_get Time:232013 Size:12874336
> 281040 Hits: map_find Time:217398 Size:12874336
> 281040 Hits: map_match Time:204495 Size:12874336
>
> 280912 Hits: ets Time:221122 Size:9245448
> 280912 Hits: pd Time:116640 Size:8426240
> 280912 Hits: dict Time:705112 Size:21150800
> 280912 Hits: gb_trees Time:732280 Size:21150800
> 280912 Hits: map_get Time:232050 Size:12874336
> 280912 Hits: map_find Time:219269 Size:12874336
> 280912 Hits: map_match Time:207320 Size:12874336
>
> Still process dictionary is the quickest. Only now dict and gb_tress eat
> up more memory than maps. It might be interesting to draw graphs of
> performance and allocated memory depending on amount of keys, but it
> likely won't change the fact that pd is the best :)
All right. I'll show you mine. I made a small change to your code that
you is not hard to guess:
Erlang/OTP 18 [erts-7.2.1] [source-ef34b57] [64-bit] [smp:8:8]
[async-threads:10] [hipe] [kernel-poll:false]
Eshell V7.2.1 (abort with ^G)
1> erl_data_bench:start(30000).
15264 Hits: ets Time:22856 Size:906312
15264 Hits: pd Time:9504 Size:2709360
15264 Hits: dict Time:49149 Size:2174112
15264 Hits: gb_trees Time:57649 Size:2174112
15264 Hits: map_get Time:19387 Size:5091000
15264 Hits: map_find Time:18641 Size:5091000
15264 Hits: map_match Time:16031 Size:5091000
done
2> erl_data_bench:start(40000).
23598 Hits: ets Time:23053 Size:-2149576
23598 Hits: pd Time:11009 Size:163864
23598 Hits: dict Time:63180 Size:5451544
23598 Hits: gb_trees Time:80919 Size:3517792
23598 Hits: map_get Time:23618 Size:5451544
23598 Hits: map_find Time:22407 Size:5451544
23598 Hits: map_match Time:21243 Size:5451544
done
3> erl_data_bench:start(50000).
32398 Hits: ets Time:28390 Size:3602912
32398 Hits: pd Time:15106 Size:4446408
32398 Hits: dict Time:81661 Size:600904
32398 Hits: gb_trees Time:96279 Size:-1573208
32398 Hits: map_get Time:29142 Size:5451544
32398 Hits: map_find Time:28677 Size:5451544
32398 Hits: map_match Time:28207 Size:5451544
done
4> erl_data_bench:start(60000).
42011 Hits: ets Time:35965 Size:4610872
42011 Hits: pd Time:20392 Size:3845496
42011 Hits: dict Time:90501 Size:5451544
42011 Hits: gb_trees Time:112758 Size:3517792
42011 Hits: map_get Time:33981 Size:5451544
42011 Hits: map_find Time:33467 Size:5451544
42011 Hits: map_match Time:32018 Size:5451544
done
5> erl_data_bench:start(70000).
51491 Hits: ets Time:40139 Size:-1433664
51491 Hits: pd Time:17873 Size:327704
51491 Hits: dict Time:106166 Size:972288
51491 Hits: gb_trees Time:132816 Size:2906040
51491 Hits: map_get Time:42655 Size:4505448
51491 Hits: map_find Time:40136 Size:4505448
51491 Hits: map_match Time:38782 Size:4505448
done
6> erl_data_bench:start(80000).
61551 Hits: ets Time:46722 Size:2459576
61551 Hits: pd Time:22964 Size:3233744
61551 Hits: dict Time:120808 Size:972288
61551 Hits: gb_trees Time:150940 Size:2906040
61551 Hits: map_get Time:47957 Size:4505448
61551 Hits: map_find Time:46025 Size:4505448
61551 Hits: map_match Time:44137 Size:4505448
done
7> erl_data_bench:start(90000).
71110 Hits: ets Time:53009 Size:4042048
71110 Hits: pd Time:27188 Size:2632832
71110 Hits: dict Time:133803 Size:2906040
71110 Hits: gb_trees Time:176076 Size:6424736
71110 Hits: map_get Time:52987 Size:4505448
71110 Hits: map_find Time:51744 Size:4505448
71110 Hits: map_match Time:49109 Size:4505448
done
8> erl_data_bench:start(100000).
81274 Hits: ets Time:61566 Size:4045352
81274 Hits: pd Time:29819 Size:3233744
81274 Hits: dict Time:143565 Size:6424736
81274 Hits: gb_trees Time:184181 Size:6424736
81274 Hits: map_get Time:55400 Size:4505448
81274 Hits: map_find Time:53725 Size:4505448
81274 Hits: map_match Time:52562 Size:4505448
done
9> erl_data_bench:start(300000).
280898 Hits: ets Time:173624 Size:9245520
280898 Hits: pd Time:83201 Size:8426240
280898 Hits: dict Time:443979 Size:-771032
280898 Hits: gb_trees Time:552626 Size:21150800
280898 Hits: map_get Time:163563 Size:12874336
280898 Hits: map_find Time:160153 Size:12874336
280898 Hits: map_match Time:154237 Size:12874336
done
Notice that ets, gb_trees and dict managed to get negative numbers, that
the numbers flucutate widely for small increases in the the value of
Accesses, as does which is the "best" data type. For 60000 and 100000
map_* looks good, but not as much for 80000.
I'd say that your benchmark does not produce reliable numbers...
That the process dictionary performs well is not a surprise since it is a
destructive data structure. The only garbage it produces is the data items
that are replaced (or deleted), there is no container that can become
garbage.
>
> Grzegorz
>
--
/ Raimo Niskanen, Erlang/OTP, Ericsson AB
More information about the erlang-questions
mailing list