How binaries are implemented

Wed Sep 9 20:54:33 CEST 2009

I'm puzzling over some details in section 4.1 of the Efficiency Guide,
"How binaries are implemented."

When a greater than 64 byte binary is created, both a RefC binary (on
the global heap) and a ProcBin (on the process heap) come into
existence. Also, any time a smaller binary is matched out of that
binary, a sub binary is created. According to the docs:

"All ProcBin objects in a process are part of a linked list, so that
the garbage collector can keep track of them and decrement the
reference counters in the binary when a ProcBin disappears. "

 Is a sub-binary essentially the same as a ProcBin, in that it gets
tracked by being added to a linked list?

I'd also be interested in more details about when the reference
counter for RefC binaries gets incremented and decremented.

I'm trying to get a handle on if extreme use of sub binaries is a bad
case for the emulator.  Imagine an XML parser that keeps the entire
document in a big binary, then has tens of thousands of sub binaries,
which are really just pointers and lengths into the master binary.  It
sounds like it would be cheap, but is the runtime designed for that?
Or would it be exploiting binaries in ways that cause additional
expense?

James