<div><br><div class="gmail_quote">On Thu, Oct 25, 2012 at 1:14 AM, Björn Gustavsson <span dir="ltr"><<a href="mailto:bgustavsson@gmail.com" target="_blank">bgustavsson@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">


On Wed, Oct 24, 2012 at 5:58 PM, Erik Pearson <<a href="mailto:erik@defunweb.com" target="_blank">erik@defunweb.com</a>> wrote:<br>

[...]<br>

<div>> And from the section on optimizations:<br>

><br>

> The basis of this optimization is that it the emulator can create bit<br>

> strings with extra uninitialized space, so if a bit string is built by<br>

> continuously appending to a binary the data does not need to be copied if<br>

> there is enough uninitialized data at the end of the bit string.<br>

><br>

> One point I don't quite understand here:<br>

><br>

> <a href="http://www.erlang.org/doc/efficiency_guide/binaryhandling.html#match_context" target="_blank">http://www.erlang.org/doc/efficiency_guide/binaryhandling.html#match_context</a><br>

><br>

> is is this function:<br>

><br>

> my_list_to_binary(List) -><br>

>     my_list_to_binary(List, <<>>).<br>

><br>

> my_list_to_binary([H|T], Acc) -><br>

>     my_list_to_binary(T, <<Acc/binary,H>>);<br>

> my_list_to_binary([], Acc) -><br>

>     Acc.<br>

><br>

> the first iteration in my_list_binary/2 that Acc will be copied, but it will<br>

> not be copied after this. I don't know why any copying occurs at all, since<br>

> it is evident that in this case the initial binary <<>> is used solely for<br>

> the append operation in the first clause. From what I understand, it is<br>

> multiple references that cause copying to occur upon append, since the<br>

> append operation may result the movement of the binary in memory, and the<br>

> other references would otherwise become invalid. I only see one reference<br>

> threading through this function.<br>

<br>

</div>Appending to a binary is optimized by the run-time system (which is<br>

stated in the Efficiency Guide) with no help from the compiler. Therefore,<br>

the run-time system has no way of knowing that the binary created in<br>

my_list_to_binary/1 will be appended to, so it will *not* mark the empty<br>

binary as appendable and reserve extra space in it for appending.<br>

<div><br></div></blockquote><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div>

><br>

> I would think it would be allocated in my_list_to_binary/1, passed by<br>

> reference, have an initial size of 256, be appended to without any copying,<br>

> and only maybe be copied if the accumulation exceeded 256 bytes (the binary<br>

> may be moved when it is extended.) Is the implication is that the initial<br>

> blank binary <<>> is a different type of binary, say fixed-length or static,<br>

> and that the first append operation triggers the copy of that binary into a<br>

> newly allocated extendable binary (with initial size twice the original or<br>

> 256 whichever is bigger)?<br>

<br>

</div>Yes, correct. The initial empty binary is a different type of binary. It is<br>

a heap binary stored in the constant pool for the module, so the<br>

"creation" of it is extremely cheap.</blockquote><div><br></div><div>Okay, this raises, for me at least, a couple of questions:</div><div><br></div><div>- The efficiency guide mentions two storage areas for binaries, process heap and an area outside of the process heap.</div>

<div>- small binaries (up to 64 bytes) are stored on the process heap, larger ones in the shared "binary" storage.</div><div><br></div><div>- there is no mention in the eff. guide of storage of binaries in the constant pool, but if they are stored there as well, then wouldn't there just be a pointer reference to it and nothing in the heap?</div>

<div><br></div><div>- Some operations, such as copying, might create a small binary on heap</div><div>- Appending will always create a large binary, since the minimum size for an appendable binary is 256.</div><div><br></div>

<div>- Are there runtime efficiency strategies for re-using large binaries, or are they always allocated when needed? Or is allocation fast enough to offset any possible advantage of pooling?</div><div><br></div><div>- And finally, the eff. guide suggests creation of a binary by copying bytes from one to another via append, yet it is also stated that sub-binaries and match contexts are cheaper than copying. Therefore I would expect that the most efficient method of creating a new binary would be to create a sub-binary by something like part, split, or pattern matching + destructuring, as Dimitry pointed out earlier in this thread (even though my tests didn't find it to be faster.)</div>

<div><br></div><div>It may be that real world testing under load would reveal different patterns -- e.g. as excessive memory usage strains the system.</div><div><br></div><div>Thanks,</div><div>Erik.</div><div><br></div><div>

<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> </blockquote><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">


<span><font color="#888888">

/Björn<br>

<br>

--<br>

Björn Gustavsson, Erlang/OTP, Ericsson AB<br>

</font></span></blockquote></div><br></div>