[erlang-questions] message copying overhead atoms vs binaries

Mon Jan 10 12:12:01 CET 2011

On Mon, Jan 10, 2011 at 6:56 PM, Paolo Negri <paolo.negri@REDACTED> wrote:
> Dear list,
>
> I'm trying to understand what approach of building values contained in
> messages to pass across processes would be more efficient (in terms of
> memory and cpu) and if the difference is significant.
> I could build my message values in two ways
>
> a) [{color, red}, {shape, circle}]
>
> b) [{color, <<"red">>}, {shape, <<"circle">>}]
>
> The set of values is strictly limited (in the order of hundreds and
> not over 1000) and these values are constantly used by the system so
> the fact that atoms would be permanently allocated is not a problem in
> this case.
>
> The actual lists are longer and can vary in length between 5 to 10 elements.
> The processes will exchange thousands of these messages each second
> and this is the reason why I'm asking this question.
>
> I'd like to know which form is cheaper in term of copy operation in
> both cases internally on a single node or across nodes hosted on
> different physical machines.

Atoms are cheaper. They are always 1 word on the heap and the rest
shared in the atom table. Even when binaries are shared, they are
larger than this [1]. In your case the binaries would not be shared
anyway, they are smaller than 64 bytes so they are considered heap
binaries [2]. The external term format also has optimizations that
allow for an atom table to compactly represent them by reference
instead of value, where other types to not have this kind of
optimization [3].

[1] http://www.erlang.org/doc/efficiency_guide/advanced.html
[2] http://www.erlang.org/doc/efficiency_guide/binaryhandling.html
[3] http://www.erlang.org/doc/apps/erts/erl_ext_dist.html

-bob