[erlang-questions] garbage collection: when?
Richard Carlsson
carlsson.richard@REDACTED
Wed Jun 1 14:00:18 CEST 2011
On 06/01/2011 01:04 PM, Roberto Ostinelli wrote:
> afaik garbage collection in erlang is per process, but if a process gets
> large, it is automatically switched over to a generational scheme.
No such magic. Each process (in the standard runtime system) has its own
heap, which is individually garbage collected. The current
implementation is a copying, generational collector. This means that
each process heap is individually divided into generations (although
when a process is new, there is only one generation). Newer generations
get collected more often than older ones.
> What is unclear to me is when the collection happens. Let me illustrate with
> a very simple example.
>
>
> start() ->
> Var = [{one, 1}, {two, 2}, ...,{thousand, 1000}],
> loop(Var).
>
> loop(Var) ->
> ...
> other_stuff(Var).
>
> other_stuff(Var) ->
> ...
> NewVar = lists:keyreplace(one, 1, Var, {one, "one"}),
> do_some_other_stuff(NewVar).
>
> do_some_other_stuff(Var) ->
> ...,
> loop(Var).
>
>
> My question is: when will the original list be garbage collected? Only
> when this whole process exits? When we go back to loop/1? What if Var
> was quite big [or you had many of these processes] and you wanted to
> optimized memory management?
Garbage collection is not triggered by any particular event (except an
explicit call to garbage_collect()), but rather, when the code tries to
do something that requires more memory, e.g., to create a tuple or cons
cell, than what is currently easily available on the heap. It then calls
the garbage collector to try to get some more free space from the newest
generation - this moves the used memory to one end and all the free
memory to the other end. If this creates enough contiguous space, the
code can continue with the allocation. Otherwise, the system will try to
garbage collect the next older generation, and so on. If all generations
have been garbage collected and there's still not enough memory for the
allocation, the Erlang runtime system will enlarge the process' heap (by
allocating more memory from the operating system).
Thus, how often garbage collection is triggered depends on how quickly
you create tuples and other data structures, and the size of the process
heap depends on whether or not it allocates new data faster than it
releases old data. If it releases data at the same speed or faster, then
it will stay at the same size (or even shrink), because garbage
collection will always be able to reclaim enough space from the existing
heap.
In your example above, the original list has no more references to it
after the call to lists:keyreplace(), so it might get collected at that
point, or at any later point, depending on whether your program needs to
allocate more data structures and how much space is currently free on
the process heap. A process that does not try to allocate more data does
not waste time doing garbage collection either.
/Richard
More information about the erlang-questions
mailing list