[erlang-questions] list vs binary performarnce, destructuring and consing

Thu Oct 25 19:36:36 CEST 2012

On Thu, Oct 25, 2012 at 1:14 AM, Björn Gustavsson <bgustavsson@REDACTED>wrote:

> On Wed, Oct 24, 2012 at 5:58 PM, Erik Pearson <erik@REDACTED> wrote:
> [...]
> > And from the section on optimizations:
> >
> > The basis of this optimization is that it the emulator can create bit
> > strings with extra uninitialized space, so if a bit string is built by
> > continuously appending to a binary the data does not need to be copied if
> > there is enough uninitialized data at the end of the bit string.
> >
> > One point I don't quite understand here:
> >
> >
> http://www.erlang.org/doc/efficiency_guide/binaryhandling.html#match_context
> >
> > is is this function:
> >
> > my_list_to_binary(List) ->
> >     my_list_to_binary(List, <<>>).
> >
> > my_list_to_binary([H|T], Acc) ->
> >     my_list_to_binary(T, <<Acc/binary,H>>);
> > my_list_to_binary([], Acc) ->
> >     Acc.
> >
> > the first iteration in my_list_binary/2 that Acc will be copied, but it
> will
> > not be copied after this. I don't know why any copying occurs at all,
> since
> > it is evident that in this case the initial binary <<>> is used solely
> for
> > the append operation in the first clause. From what I understand, it is
> > multiple references that cause copying to occur upon append, since the
> > append operation may result the movement of the binary in memory, and the
> > other references would otherwise become invalid. I only see one reference
> > threading through this function.
>
> Appending to a binary is optimized by the run-time system (which is
> stated in the Efficiency Guide) with no help from the compiler. Therefore,
> the run-time system has no way of knowing that the binary created in
> my_list_to_binary/1 will be appended to, so it will *not* mark the empty
> binary as appendable and reserve extra space in it for appending.
>
> >
> > I would think it would be allocated in my_list_to_binary/1, passed by
> > reference, have an initial size of 256, be appended to without any
> copying,
> > and only maybe be copied if the accumulation exceeded 256 bytes (the
> binary
> > may be moved when it is extended.) Is the implication is that the initial
> > blank binary <<>> is a different type of binary, say fixed-length or
> static,
> > and that the first append operation triggers the copy of that binary
> into a
> > newly allocated extendable binary (with initial size twice the original
> or
> > 256 whichever is bigger)?
>
> Yes, correct. The initial empty binary is a different type of binary. It is
> a heap binary stored in the constant pool for the module, so the
> "creation" of it is extremely cheap.

Okay, this raises, for me at least, a couple of questions:

- The efficiency guide mentions two storage areas for binaries, process
heap and an area outside of the process heap.
- small binaries (up to 64 bytes) are stored on the process heap, larger
ones in the shared "binary" storage.

- there is no mention in the eff. guide of storage of binaries in the
constant pool, but if they are stored there as well, then wouldn't there
just be a pointer reference to it and nothing in the heap?

- Some operations, such as copying, might create a small binary on heap
- Appending will always create a large binary, since the minimum size for
an appendable binary is 256.

- Are there runtime efficiency strategies for re-using large binaries, or
are they always allocated when needed? Or is allocation fast enough to
offset any possible advantage of pooling?

- And finally, the eff. guide suggests creation of a binary by copying
bytes from one to another via append, yet it is also stated that
sub-binaries and match contexts are cheaper than copying. Therefore I would
expect that the most efficient method of creating a new binary would be to
create a sub-binary by something like part, split, or pattern matching +
destructuring, as Dimitry pointed out earlier in this thread (even though
my tests didn't find it to be faster.)

It may be that real world testing under load would reveal different
patterns -- e.g. as excessive memory usage strains the system.

Thanks,
Erik.

>  /Björn
>
> --
> Björn Gustavsson, Erlang/OTP, Ericsson AB
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20121025/b3abb2bd/attachment.htm>