[erlang-questions] simple question about list memory consumption

Mon Jul 9 14:13:46 CEST 2012

On 9 Jul 2012, at 12:34, CGS wrote:

> > About the two lists you were speaking about (I suppose you were referring to the output of lists:map/2 and lists:seq/2), the second is destroyed as soon as lists:map/2 exits.
> 
> No it isn't.  It is only reclaimed when the garbage collector runs.
> 
> That is a bit puzzling. Looking at the code for lists:map/2:
> 
> map(F, [H|T]) ->
>     [F(H)|map(F, T)];
> map(F, []) when is_function(F, 1) -> [].
> 
> I can see that when lists:map/2 exits, the only remaining list is an empty list. I might be wrong, but I see only one word remaining for the garbage collector to take care of. That means, at the end, there is only 1 full list remaining (1 word really can be neglected compared with 214 MB which I got for 10^7 characters long list).

I believe Richard was referring to:

a) the list created using lists:seq(1,10000000)
b) the list that results from the call to lists:map/2

Actually, when you do this in the shell, the second list will be remembered in the command history (as well as in the variable bindings). The first list will, as Richard said, disappear when the garbage collector runs.

The quickest and most precise way of finding out how much RAM is needed for a particular data structure is:

Eshell V5.9  (abort with ^G)
1> L = lists:map(fun(_) -> 107 end,lists:seq(1,10000000)), ok.
ok
2> erts_debug:flat_size(L).
20000000

In this case, 20M words, which is very much in line with the documentation.

If you want to figure out how much memory is actually used by the VM, this is more complicated. This will depend on how the data structure was built - i.e. how it exercised the garbage collector - and how much temporary fragmentation that caused. The Erlang VM is not designed to keep the memory footprint as small as possible, but, in a sense, to maintain a reasonably low level of fragmentation in a non-stop system without ruining the soft-real-time characteristics. If you perform an operation such as creating a list with 10M elements, and from that creating another list of similar size, this can be pretty hard on Erlang's garbage collector, since it can't predict when you are going to quit accumulating data.

Let's try a very unscientific experiment:

3> exit(foo).
** exception exit: foo
4> timer:tc(fun() -> lists:map(fun(_) -> 107 end,lists:seq(1,10000000)), ok end).
{10025590,ok}

%% create a fun that evaluates F() in a fresh process with a really big heap
5> RPC = fun(F) -> {_,R} = spawn_opt(fun() -> exit({ok,F()}) end,[{min_heap_size,50000000},monitor]), receive {'DOWN',R,_,_,Res} -> {ok,Return} = Res, Return end end.    
#Fun<erl_eval.6.111823515>
6> exit(foo). 
** exception exit: foo
7> timer:tc(fun() -> RPC(fun() -> lists:map(fun(_) -> 107 end,lists:seq(1,10000000)), ok end) end).
{5378929,ok}

(I did, in fact, run the timing operations several times and kept the best result.)

Of course, by creating a slave process, we add some processing overhead, compared to calling the function directly, but in this case it's insignificant, due to the long execution time. We almost reduce the time by half by eliminating a lot of expensive GC.

(If we actually return the big list from the slave process, this increases the cost a bit, but not by nearly as much, since the VM knows how much it needs to grow the heap in order to make room for the result.)

BR,
Ulf W

Ulf Wiger, Co-founder & Developer Advocate, Feuerlabs Inc.
http://feuerlabs.com

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20120709/944a5560/attachment.htm>