[erlang-questions] How binaries are implemented

Björn-Egil Dahlberg egil@REDACTED
Thu Sep 10 14:59:27 CEST 2009


James Hague wrote:
> I'm puzzling over some details in section 4.1 of the Efficiency Guide,
> "How binaries are implemented."
> 
> When a greater than 64 byte binary is created, both a RefC binary (on
> the global heap) and a ProcBin (on the process heap) come into
> existence. Also, any time a smaller binary is matched out of that
> binary, a sub binary is created. According to the docs:
> 
> "All ProcBin objects in a process are part of a linked list, so that
> the garbage collector can keep track of them and decrement the
> reference counters in the binary when a ProcBin disappears. "
> 
I think the terminology is a bit confusing and Björn has to correct me 
if I'm wrong here.

A ProcBin is the (Eterm) header that points to the reference counted 
offheap binary, the refc binary.


>  Is a sub-binary essentially the same as a ProcBin, in that it gets
> tracked by being added to a linked list?

No. Only ProcBin binaries headers are included in the list which points 
to the RefC binary. The list is there for performance. Instead of 
scanning the heap for unreferenced ProcBins after a gc, only the list 
has to be traversed.

Also, the sub-binaries are process bound. If you send a binary, which is 
a sub-binary, to another process a new ProcBin is created for that 
process which points to the refc binary and reference counter is increased.

> 
> I'd also be interested in more details about when the reference
> counter for RefC binaries gets incremented and decremented.
> 

The counter is in direct correlation to the number of ProcBins (Eterm 
headers) to that offheap binary. When the unreferenced ProcBins gets 
garbage collected the reference counters are decreased and when new 
ProcBins are created to the binary the reference counter is increased.

Also, drivers may increase and decrease the reference counter for an 
offheap binary.

Sub-binaries will point to the ProcBins and keep them alive until they 
themselves become dead and collected.

> I'm trying to get a handle on if extreme use of sub binaries is a bad
> case for the emulator.  Imagine an XML parser that keeps the entire
> document in a big binary, then has tens of thousands of sub binaries,
> which are really just pointers and lengths into the master binary.  It
> sounds like it would be cheap, but is the runtime designed for that?
> Or would it be exploiting binaries in ways that cause additional
> expense?

Yes, it is cheap and the runtime system is designed for it.

Additional expense?
No, but some words of caution.
A subbin will reference its master so its not only the subpart of the 
binary that is live, its the whole binary.

If subbins or procbins survives to the old_heap (generational heap) the 
binary might be more longlived than intended and more expensive in terms 
of memory. Explicit calls erlang:garbage_collect/0|1, which will do a 
fullsweep of the process, will remedy this. This is a drawback which is 
being studied to improve current gc strategies.

Regards,
Björn-Egil
Erlang/OTP


More information about the erlang-questions mailing list