[erlang-questions] message copying overhead atoms vs binaries

Mon Jan 10 16:09:35 CET 2011

On Mon, Jan 10, 2011 at 12:12 PM, Bob Ippolito <bob@REDACTED> wrote:
> On Mon, Jan 10, 2011 at 6:56 PM, Paolo Negri <paolo.negri@REDACTED> wrote:
>> Dear list,
>>
>> I'm trying to understand what approach of building values contained in
>> messages to pass across processes would be more efficient (in terms of
>> memory and cpu) and if the difference is significant.
>> I could build my message values in two ways
>>
>> a) [{color, red}, {shape, circle}]
>>
>> b) [{color, <<"red">>}, {shape, <<"circle">>}]
>>
>> The set of values is strictly limited (in the order of hundreds and
>> not over 1000) and these values are constantly used by the system so
>> the fact that atoms would be permanently allocated is not a problem in
>> this case.
>>
>> The actual lists are longer and can vary in length between 5 to 10 elements.
>> The processes will exchange thousands of these messages each second
>> and this is the reason why I'm asking this question.
>>
>> I'd like to know which form is cheaper in term of copy operation in
>> both cases internally on a single node or across nodes hosted on
>> different physical machines.
>
> Atoms are cheaper. They are always 1 word on the heap and the rest
> shared in the atom table. Even when binaries are shared, they are
> larger than this [1]. In your case the binaries would not be shared
> anyway, they are smaller than 64 bytes so they are considered heap
> binaries [2]. The external term format also has optimizations that
> allow for an atom table to compactly represent them by reference
> instead of value, where other types to not have this kind of
> optimization [3].

There's a sentence in [2] that I can't fully understand.

"Heap binaries are small binaries, up to 64 bytes, that are stored
directly on the process heap. They will be copied when the process is
garbage collected and when they are sent as a message. They don't
require any special handling by the garbage collector."

Specifically I'm confused about why (and where) heap binaries will be
copied when the process is garbage collected.

Thanks,

Paolo

>
> [1] http://www.erlang.org/doc/efficiency_guide/advanced.html
> [2] http://www.erlang.org/doc/efficiency_guide/binaryhandling.html
> [3] http://www.erlang.org/doc/apps/erts/erl_ext_dist.html
>
> -bob
>