[erlang-questions] garbage collection: when?

Wed Jun 1 14:09:58 CEST 2011

On Wed, Jun 1, 2011 at 13:04, Roberto Ostinelli <roberto@REDACTED> wrote:
>
> start() ->
>     Var = [{one, 1}, {two, 2}, ...,{thousand, 1000}],
>     loop(Var).

So, initially, Var is a reference to a list [{one, ...}, ...]

>
> loop(Var) ->
>     ...
>     other_stuff(Var).

This will just pass the reference of Var to other_stuff in the function call.

> other_stuff(Var) ->
>     ...
>     NewVar = lists:keyreplace(one, 1, Var, {one, "one"}),

Ok, so the list is: Var = [{one, 1} | RestOfList], the keyreplace will
create a new list: NewVar = [{one, "one"} | RestOfList] and with the
tail RestOfList shared between those two because you did not alter
anything in that part and due to how keyreplace works.

>     do_some_other_stuff(NewVar).

Now we pass NewVar

> do_some_other_stuff(Var) ->
>     ...,
>     loop(Var).

Here, Var is really the same thing as the NewVar reference.

> My question is: when will the original list be garbage collected? Only when
> this whole process exits? When we go back to loop/1? What if Var was quite
> big [or you had many of these processes] and you wanted to optimized memory
> management?

Notice that the cell with the original head containing {one, 1} is now
dead. Hence it will be GC'ed when the next garbage collection runs. It
will run as soon as the process has allocated enough data to force the
collection. If you have knowledge that you just gave back a lot of
data you can hint the system to do the collection. Etorrent has an
example in the file system processes:

https://github.com/jlouis/etorrent/blob/master/apps/etorrent/src/etorrent_io_file.erl

Notice we set {fullsweep_after, 0} as a spawn opt and a
garbage_collect() is to be found in a timeout. The basic idea is that
the process is rather long-lived, has a very small heap, and shouldn't
be keeping data around see the 'erlang' module for the details.

The majority of the original list, namely RestOfList will never be
collected as it is still live (Its tail was shared with "NewVar" which
is now live). To optimize memory management you must:

* Use data structures and primitives that use less memory. Lists are
notoriously good at consuming memory :)
* Use the halfword emulator. A lot of stuff in Erlang happen to be
pointers. Squashing them in size help a lot.
* Use ETS to share data common to your processes.
* Note that large binaries are shared.

-- 
J.