[erlang-questions] About Garbage Collection

Thu Nov 11 02:50:14 CET 2010

On 10/11/2010, at 7:19 PM, Ş. Volkan Erdoğan wrote:

> Hi,
> I have an idea about memory management in Erlang and although the idea
> is still raw I'd like to share with you.
> 
> As far as I know:
> - There are no global variables in Erlang. Memory is allocated in functions.
> - Function calls in processes and message passing operations to other
> processes have copying semantics.

More precisely, you can't *tell* whether things are copied or not.

But in practice, function arguments are NOT copied.
Consider for example

	len(List) -> len_loop(List, 0).

	len_loop([], L) -> L;
	len_loop([_|Tail], L) -> len_loop(Tail, L+1).

Suppose you have a list L = [1,...,N].
If arguments were copied, len(L) would take O(N**2) time.
Since arguments are *not* copied, len(L) takes O(N) time.

> So I believe that these properties guarantee that any memory location
> allocated by runtime is exclusive to the function which requested that
> memory
> and it is not possible for a reference of a memory location to
> escape from the function that created it.

Consider
	f(X) -> Y = g(X), {Y,Y}.
	g(X) -> [X,X,X].

The memory allocated by g/1 for the three-element list [X,X,X]
is *not* exclusive to g/1 but outlives the call that created it.
The value of Y in f/1 is the very same list allocated in g/1.

So this is wrong.

>  And as a result it is
> possible to safely deallocate all memory allocated by the function at
> function exit since those memory locations can only be accessed by
> that function.

And that's wrong too.

However, there's something that _could_ be done.
There was a system called I think "Formula Algol" developed at CWI
many years ago; a computer algebra system.  I reimplemented it in
Burroughs B6700 Algol in the late 70s and had a lot of fun with it.
The key idea there was that when a function returned,
the new cells it had allocated were copied back to its caller,
and everything not returned was freed.

However, that has problems too.

Consider

	make_list(N)
	  when is_integer(N), N >= 0 ->
	    make_list_loop(N, []).

	make_list_loop(0, L) ->
	    L;
	make_list_loop(N, L) ->
	    make_list_loop(N-1, [N|L]).

For example, make_list(3) ==> [1,2,3].

Copying the results back would make this take O(N**2) time
whereas the real non-copying Erlang system takes O(N) time.

However, if we push on that idea a little harder, we come
close to the idea of REGION allocation.  In effect, the heap
is broken into a lot of little heaps, and a function can
allocate its result in (one of) its caller's heap(s).
A good quick reference to that is
http://mlton.org/Regions
which explains why the MLTon compiler for ML does NOT use
regions.

I don't *think* regions are compatible with hot loading, but
it would be nice to be wrong about that.