[erlang-questions] Wanted additions to the maps module?

Thu May 12 12:12:42 CEST 2016

On Wed, May 11, 2016 at 04:50:43PM +0000, Grzegorz Junka wrote:
> 
> On 11/05/2016 15:40, Raimo Niskanen wrote:
> > On Wed, May 11, 2016 at 09:48:41AM +0000, Grzegorz Junka wrote:
> >> On 11/05/2016 08:55, Raimo Niskanen wrote:
> >>> On Tue, May 10, 2016 at 02:30:19PM +0000, Grzegorz Junka wrote:
> >>>> In my test I am adding the memory consumed by the process and in ETS:
> >>>> process_size(E) + ets:info(Tid, memory) * erlang:system_info(wordsize).
> >>>> It's hard but one need to start somewhere. It's not enough to say, it's
> >>>> hard, don't do it.
> >>> I did not say "don't do it".  I said it is not a fair comparision,
> >>> with the underlying assumption that you only compared heap usage.
> >>>
> >>> But since you apparently have tried to compensate for that unfairness
> >>> you just might have a fairly fair comparision.
> >>>
> >>> As Björn-Egil pointed out, though, process_info(self(), memory) might not
> >>> be a good metric without a garbage_collect() preceeding it, depending on
> >>> the application you benchmark for.
> >>>
> >>> You do not know for certain that the final process size is the maximum
> >>> during the test, depending on when garbage collections are triggered.  Have
> >>> a look at erlang:system_monitor(Pid, [{large_heap,Size}]).  It might be
> >>> relevant in this benchmark.
> >> I am not that interested in the size of the process containing the data
> >> structure after it has been garbage collected. I want to know how much
> >> garbage the data structure is generating because this directly affects
> >> the size of the VM when many of such processes are running in parallel.
> >> I don't know of a fairer way of measuring that than by getting the size
> >> of the process straight after adding and reading the specified amount of
> >> keys.
> > Then I say that a fairer way to know how much garbage a process creates is
> > to use erlang:trace(WorkerProcess, true,
> >                      [garbage_collection,{tracer,TracerProcess}]).
> >
> > The messages from that feature can tell you how much garbage is being
> > reclaimed.
> 
> I probably don't explain myself clearly. I am not interested in how much 
> garbage is reclaimed either. I am interested in the effect of using one 
> data structure or the other on the size of the VM. In other words, how 
> much in average will my VM grow when I create a process that stores 
> 30,000 key value pairs. This is so that I can estimate which data 
> structure will let me store the biggest amount of key-value pairs given 
> a specific amount of physical memory and a reasonable performance of 
> looking up existing and storing new pairs. Imagine a huge data store all 
> kept in memory, similar to mnesia ram disk table.

I got that!

I try to explain that the size of the process after the test is not a valid
measurement point!  Since heaps can shrink the process may have been larger
during the test.

> 
> 
> >> Please note that there is no deleting of key-value pairs, only adding
> >> and reading. So any garbage generated is caused by changes to the
> >> internal representation of the table holding the growing amount of
> >> elements. The allocator can't reuse memory chunks released by elements
> >> deleted previously. This may not be applicable to all real-life
> > It can on the contrary reuse such memory chunks.
> >
> > Tuple example:
> > * We start with a tuple of 1000 elements, and a list of 30000 elements to
> >    insert in the tuple.  Each element uses 3 words plus 2 in the list cons
> >    cell that is 5 * 30000 words plus the containing tuple of 1001 words
> >    that is 151001 words.  The nearest above in the Fibonacci series that
> >    the heap algorithm uses is 189886 words.
> > * We then add one element to the tuple. We only need to allocate 1002 words
> >    for the new containing tuple, and we have 189886 - 151001 = 38885 so that
> >    is no problem.
> > * After creating the new containing tuple the old one is garbage not yet
> >    noticed.  We continue to add elements and therefore allocate new
> >    containing tuples of 1003, 1004, ... words, so at about thirty added
> >    elements we run out of heap and allocate a new heap of 307242 (Fibonacci,
> >    again) words.
> > * After creating the containing tuple that caused the new heap allocation
> >    we garbage collect and then find out that all but the last tuple
> >    was needed so we have allocated 307242 but only use 5 * 30000
> >    + 1030 = 151030 words. So now we can go on with around (307242 - 151030)
> >    / 1030 ~= 150 element insertions before the next heap allocation where we
> >    again will allocate the next size 497128 and only use about 5 * 30000 +
> >    1180 = 151180 words.  That is 497128 / 151180 ~= 3.3 times more than
> >    needed.
> > * If the garbage collect causes the new heap to be less than one fourth
> >    of the old heap size the new heap may be _shrunk_.
> >
> > So depending on if you happen to stop your iteration just before or just
> > after a garbage collection, maybe a shrinking one, you will get totally
> > different heap sizes.
> >
> > To furthermore complicate the situation much of your data for example the
> > list you start with will survive two garbage collections and therefore be
> > stored on the old-heap.  This also applies to the parts of every tree data
> > structure you benchmark that does not change often.  This throws all
> > numbers in my tuple example above out the window.
> >
> > Try increasing Accesses to 300000, and to make the test complete in a
> > reasonable amount of time remove list and orddict from Types, and you will
> > get completely different results of your benchmark.  We get for example
> > _negative_ results for dict and map looks like a winner.  This kind of
> > wobbling result depending on the scale of the test is a reliable sign of an
> > unreliable benchmark...
> 
> I use the same input data in the same order in all tests in all cycles 
> during a single run. I believe there is no randomness in storing the 
> same keys in the same data structures, so even if the reallocation 
> occurs as you mentioned, it should be deterministically reproducible in 
> every cycle. This indeed can be seen in the results, where in three 

The randomness is not between different runs of the same data structure; it
is between different data structures so you can not compare your result
from dict with the result from gb_trees.

This is because a heap may shrink at some garbage collection and exactly when
that happens depends on how much garbage has been produced, and how much
garbage a certain data structure produces is very hard to predict (pseudorandom).

> cycles the reported sizes are the same every time. On the next run some 
> results will be different, but again the same in all three cycles in 
> that single run.
> 
> It's interesting that no matter how many times I run the test some of 
> the processes end up with just the same amount of memory allocated to 
> it. Even if the input data changes, most of the data structures still 
> allocate the same amount of memory, e.g.:

The heap sizes are selected from the a Fibonacci-like series.
That creates a quantification that is seen here.

> 
> 15335 Hits: ets         Time:22068      Size:903056
> 15335 Hits: pd          Time:12296      Size:2709360
> 15335 Hits: dict        Time:67726      Size:2174112
> 15335 Hits: gb_trees    Time:77140      Size:2174112
> 15335 Hits: map_get     Time:22862      Size:5091000
> 15335 Hits: map_find    Time:23983      Size:5091000
> 15335 Hits: map_match   Time:20434      Size:5091000
> 15335 Hits: list        Time:19606771   Size:371376
> 15335 Hits: orddict     Time:8952194    Size:371376
> 
> 15181 Hits: ets         Time:21946      Size:900584
> 15181 Hits: pd          Time:12655      Size:2709360
> 15181 Hits: dict        Time:70246      Size:5091000
> 15181 Hits: gb_trees    Time:83523      Size:2174112
> 15181 Hits: map_get     Time:26856      Size:5091000
> 15181 Hits: map_find    Time:21480      Size:5091000
> 15181 Hits: map_match   Time:20160      Size:5091000
> 15181 Hits: list        Time:19477574   Size:600904
> 15181 Hits: orddict     Time:9250092    Size:600904
> 
> 15254 Hits: ets         Time:22745      Size:903552
> 15254 Hits: pd          Time:12688      Size:2709360
> 15254 Hits: dict        Time:70869      Size:2174112
> 15254 Hits: gb_trees    Time:83563      Size:2174112
> 15254 Hits: map_get     Time:23372      Size:5091000
> 15254 Hits: map_find    Time:24381      Size:5091000
> 15254 Hits: map_match   Time:20008      Size:5091000
> 15254 Hits: list        Time:19669187   Size:600904
> 15254 Hits: orddict     Time:9139247    Size:600904
> 
> Those are different runs, which can be seen by the Hits count. And yet 
> maps and gb_trees take the same amount of memory every time.
> 
> Running with 300,000 elements doesn't change much:
> 
> 281040 Hits: ets         Time:217597     Size:9245448
> 281040 Hits: pd          Time:119464     Size:8426240
> 281040 Hits: dict        Time:700924     Size:21150800
> 281040 Hits: gb_trees    Time:759213     Size:21150800
> 281040 Hits: map_get     Time:232013     Size:12874336
> 281040 Hits: map_find    Time:217398     Size:12874336
> 281040 Hits: map_match   Time:204495     Size:12874336
> 
> 280912 Hits: ets         Time:221122     Size:9245448
> 280912 Hits: pd          Time:116640     Size:8426240
> 280912 Hits: dict        Time:705112     Size:21150800
> 280912 Hits: gb_trees    Time:732280     Size:21150800
> 280912 Hits: map_get     Time:232050     Size:12874336
> 280912 Hits: map_find    Time:219269     Size:12874336
> 280912 Hits: map_match   Time:207320     Size:12874336
> 
> Still process dictionary is the quickest. Only now dict and gb_tress eat 
> up more memory than maps. It might be interesting to draw graphs of 
> performance and allocated memory depending on amount of keys, but it 
> likely won't change the fact that pd is the best :)

All right.  I'll show you mine.  I made a small change to your code that
you is not hard to guess:

Erlang/OTP 18 [erts-7.2.1] [source-ef34b57] [64-bit] [smp:8:8]
[async-threads:10] [hipe] [kernel-poll:false]

Eshell V7.2.1  (abort with ^G)
1> erl_data_bench:start(30000).
15264 Hits: ets         Time:22856      Size:906312
15264 Hits: pd          Time:9504       Size:2709360
15264 Hits: dict        Time:49149      Size:2174112
15264 Hits: gb_trees    Time:57649      Size:2174112
15264 Hits: map_get     Time:19387      Size:5091000
15264 Hits: map_find    Time:18641      Size:5091000
15264 Hits: map_match   Time:16031      Size:5091000
done
2> erl_data_bench:start(40000).
23598 Hits: ets         Time:23053      Size:-2149576
23598 Hits: pd          Time:11009      Size:163864
23598 Hits: dict        Time:63180      Size:5451544
23598 Hits: gb_trees    Time:80919      Size:3517792
23598 Hits: map_get     Time:23618      Size:5451544
23598 Hits: map_find    Time:22407      Size:5451544
23598 Hits: map_match   Time:21243      Size:5451544
done
3> erl_data_bench:start(50000).
32398 Hits: ets         Time:28390      Size:3602912
32398 Hits: pd          Time:15106      Size:4446408
32398 Hits: dict        Time:81661      Size:600904
32398 Hits: gb_trees    Time:96279      Size:-1573208
32398 Hits: map_get     Time:29142      Size:5451544
32398 Hits: map_find    Time:28677      Size:5451544
32398 Hits: map_match   Time:28207      Size:5451544
done
4> erl_data_bench:start(60000).
42011 Hits: ets         Time:35965      Size:4610872
42011 Hits: pd          Time:20392      Size:3845496
42011 Hits: dict        Time:90501      Size:5451544
42011 Hits: gb_trees    Time:112758     Size:3517792
42011 Hits: map_get     Time:33981      Size:5451544
42011 Hits: map_find    Time:33467      Size:5451544
42011 Hits: map_match   Time:32018      Size:5451544
done
5> erl_data_bench:start(70000).
51491 Hits: ets         Time:40139      Size:-1433664
51491 Hits: pd          Time:17873      Size:327704
51491 Hits: dict        Time:106166     Size:972288
51491 Hits: gb_trees    Time:132816     Size:2906040
51491 Hits: map_get     Time:42655      Size:4505448
51491 Hits: map_find    Time:40136      Size:4505448
51491 Hits: map_match   Time:38782      Size:4505448
done
6> erl_data_bench:start(80000).
61551 Hits: ets         Time:46722      Size:2459576
61551 Hits: pd          Time:22964      Size:3233744
61551 Hits: dict        Time:120808     Size:972288
61551 Hits: gb_trees    Time:150940     Size:2906040
61551 Hits: map_get     Time:47957      Size:4505448
61551 Hits: map_find    Time:46025      Size:4505448
61551 Hits: map_match   Time:44137      Size:4505448
done
7> erl_data_bench:start(90000).
71110 Hits: ets         Time:53009      Size:4042048
71110 Hits: pd          Time:27188      Size:2632832
71110 Hits: dict        Time:133803     Size:2906040
71110 Hits: gb_trees    Time:176076     Size:6424736
71110 Hits: map_get     Time:52987      Size:4505448
71110 Hits: map_find    Time:51744      Size:4505448
71110 Hits: map_match   Time:49109      Size:4505448
done
8> erl_data_bench:start(100000).
81274 Hits: ets         Time:61566      Size:4045352
81274 Hits: pd          Time:29819      Size:3233744
81274 Hits: dict        Time:143565     Size:6424736
81274 Hits: gb_trees    Time:184181     Size:6424736
81274 Hits: map_get     Time:55400      Size:4505448
81274 Hits: map_find    Time:53725      Size:4505448
81274 Hits: map_match   Time:52562      Size:4505448
done
9> erl_data_bench:start(300000).
280898 Hits: ets         Time:173624     Size:9245520
280898 Hits: pd          Time:83201      Size:8426240
280898 Hits: dict        Time:443979     Size:-771032
280898 Hits: gb_trees    Time:552626     Size:21150800
280898 Hits: map_get     Time:163563     Size:12874336
280898 Hits: map_find    Time:160153     Size:12874336
280898 Hits: map_match   Time:154237     Size:12874336
done

Notice that ets, gb_trees and dict managed to get negative numbers, that
the numbers flucutate widely for small increases in the the value of
Accesses, as does which is the "best" data type.  For 60000 and 100000
map_* looks good, but not as much for 80000.

I'd say that your benchmark does not produce reliable numbers...

That the process dictionary performs well is not a surprise since it is a
destructive data structure. The only garbage it produces is the data items
that are replaced (or deleted), there is no container that can become
garbage.

> 
> Grzegorz
> 

-- 

/ Raimo Niskanen, Erlang/OTP, Ericsson AB