[erlang-questions] garbage collection: when?

Wed Jun 1 14:00:18 CEST 2011

On 06/01/2011 01:04 PM, Roberto Ostinelli wrote:
> afaik garbage collection in erlang is per process, but if a process gets
> large, it is automatically switched over to a generational scheme.

No such magic. Each process (in the standard runtime system) has its own 
heap, which is individually garbage collected. The current 
implementation is a copying, generational collector. This means that 
each process heap is individually divided into generations (although 
when a process is new, there is only one generation). Newer generations 
get collected more often than older ones.

> What is unclear to me is when the collection happens. Let me illustrate with
> a very simple example.
>
>
> start() ->
>      Var = [{one, 1}, {two, 2}, ...,{thousand, 1000}],
>      loop(Var).
>
> loop(Var) ->
>      ...
>      other_stuff(Var).
>
> other_stuff(Var) ->
>      ...
>      NewVar = lists:keyreplace(one, 1, Var, {one, "one"}),
>      do_some_other_stuff(NewVar).
>
> do_some_other_stuff(Var) ->
>      ...,
>      loop(Var).
>
>
> My question is: when will the original list be garbage collected? Only
> when this whole process exits? When we go back to loop/1? What if Var
> was quite big [or you had many of these processes] and you wanted to
> optimized memory management?

Garbage collection is not triggered by any particular event (except an 
explicit call to garbage_collect()), but rather, when the code tries to 
do something that requires more memory, e.g., to create a tuple or cons 
cell, than what is currently easily available on the heap. It then calls 
the garbage collector to try to get some more free space from the newest 
generation - this moves the used memory to one end and all the free 
memory to the other end. If this creates enough contiguous space, the 
code can continue with the allocation. Otherwise, the system will try to 
garbage collect the next older generation, and so on. If all generations 
have been garbage collected and there's still not enough memory for the 
allocation, the Erlang runtime system will enlarge the process' heap (by 
allocating more memory from the operating system).

Thus, how often garbage collection is triggered depends on how quickly 
you create tuples and other data structures, and the size of the process 
heap depends on whether or not it allocates new data faster than it 
releases old data. If it releases data at the same speed or faster, then 
it will stay at the same size (or even shrink), because garbage 
collection will always be able to reclaim enough space from the existing 
heap.

In your example above, the original list has no more references to it 
after the call to lists:keyreplace(), so it might get collected at that 
point, or at any later point, depending on whether your program needs to 
allocate more data structures and how much space is currently free on 
the process heap. A process that does not try to allocate more data does 
not waste time doing garbage collection either.

     /Richard