[erlang-questions] How binaries are implemented
Thu Sep 10 14:59:27 CEST 2009
James Hague wrote:
> I'm puzzling over some details in section 4.1 of the Efficiency Guide,
> "How binaries are implemented."
> When a greater than 64 byte binary is created, both a RefC binary (on
> the global heap) and a ProcBin (on the process heap) come into
> existence. Also, any time a smaller binary is matched out of that
> binary, a sub binary is created. According to the docs:
> "All ProcBin objects in a process are part of a linked list, so that
> the garbage collector can keep track of them and decrement the
> reference counters in the binary when a ProcBin disappears. "
I think the terminology is a bit confusing and Björn has to correct me
if I'm wrong here.
A ProcBin is the (Eterm) header that points to the reference counted
offheap binary, the refc binary.
> Is a sub-binary essentially the same as a ProcBin, in that it gets
> tracked by being added to a linked list?
No. Only ProcBin binaries headers are included in the list which points
to the RefC binary. The list is there for performance. Instead of
scanning the heap for unreferenced ProcBins after a gc, only the list
has to be traversed.
Also, the sub-binaries are process bound. If you send a binary, which is
a sub-binary, to another process a new ProcBin is created for that
process which points to the refc binary and reference counter is increased.
> I'd also be interested in more details about when the reference
> counter for RefC binaries gets incremented and decremented.
The counter is in direct correlation to the number of ProcBins (Eterm
headers) to that offheap binary. When the unreferenced ProcBins gets
garbage collected the reference counters are decreased and when new
ProcBins are created to the binary the reference counter is increased.
Also, drivers may increase and decrease the reference counter for an
Sub-binaries will point to the ProcBins and keep them alive until they
themselves become dead and collected.
> I'm trying to get a handle on if extreme use of sub binaries is a bad
> case for the emulator. Imagine an XML parser that keeps the entire
> document in a big binary, then has tens of thousands of sub binaries,
> which are really just pointers and lengths into the master binary. It
> sounds like it would be cheap, but is the runtime designed for that?
> Or would it be exploiting binaries in ways that cause additional
Yes, it is cheap and the runtime system is designed for it.
No, but some words of caution.
A subbin will reference its master so its not only the subpart of the
binary that is live, its the whole binary.
If subbins or procbins survives to the old_heap (generational heap) the
binary might be more longlived than intended and more expensive in terms
of memory. Explicit calls erlang:garbage_collect/0|1, which will do a
fullsweep of the process, will remedy this. This is a drawback which is
being studied to improve current gc strategies.
More information about the erlang-questions