[erlang-questions] message copying overhead atoms vs binaries

Robert Virding robert.virding@REDACTED
Mon Jan 10 21:36:30 CET 2011


----- "David Mercer" <dmercer@REDACTED> wrote:

> On Monday, January 10, 2011, Paolo Negri wrote:
> 
> > There's a sentence in [2] that I can't fully understand.
> > 
> > "Heap binaries are small binaries, up to 64 bytes, that are stored
> > directly on the process heap. They will be copied when the process
> is
> > garbage collected and when they are sent as a message. They don't
> > require any special handling by the garbage collector."
> > 
> > Specifically I'm confused about why (and where) heap binaries will
> be
> > copied when the process is garbage collected.
> 
> [I'm not the expert, but since no-one jumped up to answer, I'll give
> you
> what I think is the answer, and if no-one more qualified answers, mine
> can
> be considered the default correct answer.]
> 
> Presumably, the Erlang runtime uses a copying garbage collector when
> it
> garbage collects a process.  At a high level: it creates a new heap
> area and
> copies the "active" portions of the old heap to the new, and then
> returns
> the old heap area in its entirety to the system for reuse.  There are
> umpteen variations on this, but at a high level, I'm guessing that is
> what
> is meant by copying during garbage collection.

Yes, that is what is meant by a copying collector, which is what the BEAM uses to gc process heaps. It might seem inefficient to copy data instead of leaving it be and just marking the free areas but if the amount of live data is small, which it usually is, then it becomes more efficient. You also get compaction for "free" which is a big win. The algorithm is also simpler.

Messages are copied from the sender's heap to the receiver's heap.

Small binaries, less then 64 bytes, are stored in the process heaps so are gc'ed and copied as other data. As binaries can easily become VERY large this would become very inefficient. To get around this large binaries, more than 64 bytes, are stored outside the process heaps. They are not copied in either gc or message passing which makes it quite efficient to send them in messages. The problem is that this complicates the collector as it has to be able to handle that many processes may be referencing the binary so it can take a longer time for the collector to detect when they are free.

Robert

-- 
Robert Virding, Erlang Solutions Ltd.


More information about the erlang-questions mailing list