[erlang-questions] Wanted additions to the maps module?

Wed May 11 18:50:43 CEST 2016

On 11/05/2016 15:40, Raimo Niskanen wrote:
> On Wed, May 11, 2016 at 09:48:41AM +0000, Grzegorz Junka wrote:
>> On 11/05/2016 08:55, Raimo Niskanen wrote:
>>> On Tue, May 10, 2016 at 02:30:19PM +0000, Grzegorz Junka wrote:
>>>> In my test I am adding the memory consumed by the process and in ETS:
>>>> process_size(E) + ets:info(Tid, memory) * erlang:system_info(wordsize).
>>>> It's hard but one need to start somewhere. It's not enough to say, it's
>>>> hard, don't do it.
>>> I did not say "don't do it".  I said it is not a fair comparision,
>>> with the underlying assumption that you only compared heap usage.
>>>
>>> But since you apparently have tried to compensate for that unfairness
>>> you just might have a fairly fair comparision.
>>>
>>> As Björn-Egil pointed out, though, process_info(self(), memory) might not
>>> be a good metric without a garbage_collect() preceeding it, depending on
>>> the application you benchmark for.
>>>
>>> You do not know for certain that the final process size is the maximum
>>> during the test, depending on when garbage collections are triggered.  Have
>>> a look at erlang:system_monitor(Pid, [{large_heap,Size}]).  It might be
>>> relevant in this benchmark.
>> I am not that interested in the size of the process containing the data
>> structure after it has been garbage collected. I want to know how much
>> garbage the data structure is generating because this directly affects
>> the size of the VM when many of such processes are running in parallel.
>> I don't know of a fairer way of measuring that than by getting the size
>> of the process straight after adding and reading the specified amount of
>> keys.
> Then I say that a fairer way to know how much garbage a process creates is
> to use erlang:trace(WorkerProcess, true,
>                      [garbage_collection,{tracer,TracerProcess}]).
>
> The messages from that feature can tell you how much garbage is being
> reclaimed.

I probably don't explain myself clearly. I am not interested in how much 
garbage is reclaimed either. I am interested in the effect of using one 
data structure or the other on the size of the VM. In other words, how 
much in average will my VM grow when I create a process that stores 
30,000 key value pairs. This is so that I can estimate which data 
structure will let me store the biggest amount of key-value pairs given 
a specific amount of physical memory and a reasonable performance of 
looking up existing and storing new pairs. Imagine a huge data store all 
kept in memory, similar to mnesia ram disk table.

>> Please note that there is no deleting of key-value pairs, only adding
>> and reading. So any garbage generated is caused by changes to the
>> internal representation of the table holding the growing amount of
>> elements. The allocator can't reuse memory chunks released by elements
>> deleted previously. This may not be applicable to all real-life
> It can on the contrary reuse such memory chunks.
>
> Tuple example:
> * We start with a tuple of 1000 elements, and a list of 30000 elements to
>    insert in the tuple.  Each element uses 3 words plus 2 in the list cons
>    cell that is 5 * 30000 words plus the containing tuple of 1001 words
>    that is 151001 words.  The nearest above in the Fibonacci series that
>    the heap algorithm uses is 189886 words.
> * We then add one element to the tuple. We only need to allocate 1002 words
>    for the new containing tuple, and we have 189886 - 151001 = 38885 so that
>    is no problem.
> * After creating the new containing tuple the old one is garbage not yet
>    noticed.  We continue to add elements and therefore allocate new
>    containing tuples of 1003, 1004, ... words, so at about thirty added
>    elements we run out of heap and allocate a new heap of 307242 (Fibonacci,
>    again) words.
> * After creating the containing tuple that caused the new heap allocation
>    we garbage collect and then find out that all but the last tuple
>    was needed so we have allocated 307242 but only use 5 * 30000
>    + 1030 = 151030 words. So now we can go on with around (307242 - 151030)
>    / 1030 ~= 150 element insertions before the next heap allocation where we
>    again will allocate the next size 497128 and only use about 5 * 30000 +
>    1180 = 151180 words.  That is 497128 / 151180 ~= 3.3 times more than
>    needed.
> * If the garbage collect causes the new heap to be less than one fourth
>    of the old heap size the new heap may be _shrunk_.
>
> So depending on if you happen to stop your iteration just before or just
> after a garbage collection, maybe a shrinking one, you will get totally
> different heap sizes.
>
> To furthermore complicate the situation much of your data for example the
> list you start with will survive two garbage collections and therefore be
> stored on the old-heap.  This also applies to the parts of every tree data
> structure you benchmark that does not change often.  This throws all
> numbers in my tuple example above out the window.
>
> Try increasing Accesses to 300000, and to make the test complete in a
> reasonable amount of time remove list and orddict from Types, and you will
> get completely different results of your benchmark.  We get for example
> _negative_ results for dict and map looks like a winner.  This kind of
> wobbling result depending on the scale of the test is a reliable sign of an
> unreliable benchmark...

I use the same input data in the same order in all tests in all cycles 
during a single run. I believe there is no randomness in storing the 
same keys in the same data structures, so even if the reallocation 
occurs as you mentioned, it should be deterministically reproducible in 
every cycle. This indeed can be seen in the results, where in three 
cycles the reported sizes are the same every time. On the next run some 
results will be different, but again the same in all three cycles in 
that single run.

It's interesting that no matter how many times I run the test some of 
the processes end up with just the same amount of memory allocated to 
it. Even if the input data changes, most of the data structures still 
allocate the same amount of memory, e.g.:

15335 Hits: ets         Time:22068      Size:903056
15335 Hits: pd          Time:12296      Size:2709360
15335 Hits: dict        Time:67726      Size:2174112
15335 Hits: gb_trees    Time:77140      Size:2174112
15335 Hits: map_get     Time:22862      Size:5091000
15335 Hits: map_find    Time:23983      Size:5091000
15335 Hits: map_match   Time:20434      Size:5091000
15335 Hits: list        Time:19606771   Size:371376
15335 Hits: orddict     Time:8952194    Size:371376

15181 Hits: ets         Time:21946      Size:900584
15181 Hits: pd          Time:12655      Size:2709360
15181 Hits: dict        Time:70246      Size:5091000
15181 Hits: gb_trees    Time:83523      Size:2174112
15181 Hits: map_get     Time:26856      Size:5091000
15181 Hits: map_find    Time:21480      Size:5091000
15181 Hits: map_match   Time:20160      Size:5091000
15181 Hits: list        Time:19477574   Size:600904
15181 Hits: orddict     Time:9250092    Size:600904

15254 Hits: ets         Time:22745      Size:903552
15254 Hits: pd          Time:12688      Size:2709360
15254 Hits: dict        Time:70869      Size:2174112
15254 Hits: gb_trees    Time:83563      Size:2174112
15254 Hits: map_get     Time:23372      Size:5091000
15254 Hits: map_find    Time:24381      Size:5091000
15254 Hits: map_match   Time:20008      Size:5091000
15254 Hits: list        Time:19669187   Size:600904
15254 Hits: orddict     Time:9139247    Size:600904

Those are different runs, which can be seen by the Hits count. And yet 
maps and gb_trees take the same amount of memory every time.

Running with 300,000 elements doesn't change much:

281040 Hits: ets         Time:217597     Size:9245448
281040 Hits: pd          Time:119464     Size:8426240
281040 Hits: dict        Time:700924     Size:21150800
281040 Hits: gb_trees    Time:759213     Size:21150800
281040 Hits: map_get     Time:232013     Size:12874336
281040 Hits: map_find    Time:217398     Size:12874336
281040 Hits: map_match   Time:204495     Size:12874336

280912 Hits: ets         Time:221122     Size:9245448
280912 Hits: pd          Time:116640     Size:8426240
280912 Hits: dict        Time:705112     Size:21150800
280912 Hits: gb_trees    Time:732280     Size:21150800
280912 Hits: map_get     Time:232050     Size:12874336
280912 Hits: map_find    Time:219269     Size:12874336
280912 Hits: map_match   Time:207320     Size:12874336

Still process dictionary is the quickest. Only now dict and gb_tress eat 
up more memory than maps. It might be interesting to draw graphs of 
performance and allocated memory depending on amount of keys, but it 
likely won't change the fact that pd is the best :)

Grzegorz