[erlang-questions] Wanted additions to the maps module?

Wed May 11 11:48:41 CEST 2016

On 11/05/2016 08:55, Raimo Niskanen wrote:
> On Tue, May 10, 2016 at 02:30:19PM +0000, Grzegorz Junka wrote:
>> In my test I am adding the memory consumed by the process and in ETS:
>> process_size(E) + ets:info(Tid, memory) * erlang:system_info(wordsize).
>> It's hard but one need to start somewhere. It's not enough to say, it's
>> hard, don't do it.
> I did not say "don't do it".  I said it is not a fair comparision,
> with the underlying assumption that you only compared heap usage.
>
> But since you apparently have tried to compensate for that unfairness
> you just might have a fairly fair comparision.
>
> As Björn-Egil pointed out, though, process_info(self(), memory) might not
> be a good metric without a garbage_collect() preceeding it, depending on
> the application you benchmark for.
>
> You do not know for certain that the final process size is the maximum
> during the test, depending on when garbage collections are triggered.  Have
> a look at erlang:system_monitor(Pid, [{large_heap,Size}]).  It might be
> relevant in this benchmark.

I am not that interested in the size of the process containing the data 
structure after it has been garbage collected. I want to know how much 
garbage the data structure is generating because this directly affects 
the size of the VM when many of such processes are running in parallel. 
I don't know of a fairer way of measuring that than by getting the size 
of the process straight after adding and reading the specified amount of 
keys.

Sure, the size of the process may shrink at some later time, but in a 
real application when I don't call gc purposefully the size of the 
process may also stay larger for a longer period of time before gc 
shrinks it.

Please note that there is no deleting of key-value pairs, only adding 
and reading. So any garbage generated is caused by changes to the 
internal representation of the table holding the growing amount of 
elements. The allocator can't reuse memory chunks released by elements 
deleted previously. This may not be applicable to all real-life 
scenarios but should more accurately reflect the behaviour where the 
data structure holds many key-value pairs and the application is rarely 
changing the existing but is mostly adding new key-value pairs.

Grzegorz