[erlang-questions] Wanted additions to the maps module?

Wed May 11 10:55:10 CEST 2016

On Tue, May 10, 2016 at 02:30:19PM +0000, Grzegorz Junka wrote:
> 
> >>>>      https://gist.github.com/amiramix/d43c9a73a6fe6d651d7f
> >>>>
> >>>>      Maps are quite performant but process dictionary is still quicker
> >>>>      and maps are the worst when it comes to consumed memory, taking
> >>>>      twice as much as dict or process dictionary and over 5 times as
> >>>>      much memory as ets.
> >>>>
> >>>> Well, you are comparing apples and oranges. Process dictionary and
> >>>> ETS are something completely different from gb_trees, dict, maps or
> >>>> orddict.
> >> What's the point of comparing gb_tress to gb_trees or maps to maps?
> > Please misunderstand correctly.  Nobody is talking about comparing gb_trees
> > to gb_trees.  What Björn-Egil said was that it is fair to compare any
> > within the group (gb_trees, dict, maps, orddict) with each other since all
> > are functional heap based data structures.  But it is not fair to compare
> > any one in the group with neither the process dictionary nor with ETS.
> >
> > It is also hard to compare the process dictionary with ETS since the
> > process dictionary stores on the process heap while ETS stores in other
> > allocated memory.
> >
> 
> If I need to store many key-value pairs I don't care which group the 
> structure belongs to. I measure and it pick one that is most suitable 
> for the job at hand considering all the limitations. Why should I limit 
> the test to comparing only some data structures with some others?

I did not say you should limit your tests.  I just said it is hard to do
a fair comparision.

In fact it would be very interesting to get a usable and understandable
comparision of the different available dictionary types, if possible.

> 
> In my test I am adding the memory consumed by the process and in ETS: 
> process_size(E) + ets:info(Tid, memory) * erlang:system_info(wordsize).
> It's hard but one need to start somewhere. It's not enough to say, it's 
> hard, don't do it.

I did not say "don't do it".  I said it is not a fair comparision,
with the underlying assumption that you only compared heap usage.

But since you apparently have tried to compensate for that unfairness
you just might have a fairly fair comparision.

As Björn-Egil pointed out, though, process_info(self(), memory) might not
be a good metric without a garbage_collect() preceeding it, depending on
the application you benchmark for.

You do not know for certain that the final process size is the maximum
during the test, depending on when garbage collections are triggered.  Have
a look at erlang:system_monitor(Pid, [{large_heap,Size}]).  It might be
relevant in this benchmark.

> 
> I am not saying that maps are implemented badly or that they are bad 
> data structures. Only that they are not the only data structure that 
> should be considered because everything comes with some limitations and 
> trade-offs.

Precisely.  The limitations and trade-offs makes the different alternatives
hard to compare.  (Note that I just now did not say impossible nor that it
should not be done)  I just said "hard" and it it is also hard
(not impossible and no that it should not be done) to present the results
in an understandable, usable and trustable way for others.

-- 

/ Raimo Niskanen, Erlang/OTP, Ericsson AB