[erlang-questions] Garbage Collector Details

Fri Sep 9 17:07:33 CEST 2011

On Fri, Sep 9, 2011 at 13:35, Johannes Auer <johannes.auer@REDACTED> wrote:

> It says that a message is first copied to the private heap of the receiving process, and afterwards a pointer to that message is placed in the message queue. But copying to the heap would require locking, since many processes can send messages concurrently to the same receiving process. Could someone explain what is happening *exactly* when a message is sent from one process to another?

There are three tricks being employed, and I am sure one from the
Erlang/OTP team will correct me if I am wrong here.

First, an Erlang process struct is governed by a series of locks (4 at
the time being). For our purposes the most important locks are the
MSGQ and STATUS locks which together can be called the SENDRECV lock.
The two locks protect some specific fields in the process struct - we
are concerned with the message queue called the InQueue. The InQueue
is a linked list of incoming messages, protected by the MSGQ lock. But
we also need the STATUS lock, so we will take both of these and call
them the SENDRECV lock. There is also another important lock, the MAIN
lock which protects every field not protected by a more specific lock.

Second, notice that the heap of a process can be fragmented. That is,
rather than consist of a contiguous piece of memory, it may consist of
several smaller fragments. When we run a garbage collection in the
process, these fragments will be combined and the fragments will be
released.

Third, the message queue is split into two. There is the above
mentioned InQueue, protected by MSGQ. And there is also the PrivQueue,
the private queue protected by MAIN. The idea is that if you hold the
MSGQ lock, you can send messages but the process can still do stuff as
long as it only needs MAIN.

++

Whenever a process is running, it holds MAIN. This allows it exclusive
access to most fields and the PrivQueue. When process S wants to send
to R, it does the following (in a very simplified explanation):

1. Allocate/Grab a new heap fragment. I don't know how much fragment
reuse there is, but there could be some.
2. Copy the message to the heap fragment. We have exclusive right to
the fragment.
3. Grab the MSGQ and STATUS locks of R.
4. Add the message (and fragment) to the InQueue.
5. Release the MSGQ and STATUS locks of R.

4 is mostly just a pointer assignment, so the time we hold the lock is
extremely small. Also, we don't stop another scheduler with MAIN from
executing the process. Even the copy is "free" because it happens
outside the lock.

When the process executes, it will grab MAIN and do its work off the
PrivQueue. If it exhausts, it'll grab the MSGQ lock and evacuate the
messages to the PrivQueue. There is also an important invariant, which
is that InQueue *never* contains pointers into the process "main
heap". This means that the Garbage Collector does not need to include
InQueue in its root set.

Garbage collection then amounts to combining all the fragments and
collecting garbage. The neat trick here is that the fragments works as
a handoff algorithm. The pointer to the fragment is a "token" and he
who holds that token is responsible for its collection. When we
allocate a new fragment/token and then hand it off to the R process,
we give the R process the responsibility of collecting and reclaiming
the data. But the allocation and copying work is done in S.

Optimistically, S rarely blocks R and R rarely blocks S. Even if we
have multiple senders, they hold the lock for a very short time, so
they are unlikely to trip over each other much. Of course it goes
wrong if there are scores of threads trying to get that lock, but
chances are you have a badly designed program then.

I should also mention that this is a message from S to R locally on
the same machine with S and R different processes. The code path if S
= R or if S and R are on different distributed nodes is different.

-- 
J.