[erlang-questions] term_to_binary and record improvements

Ulf Wiger (TN/EAB) ulf.wiger@REDACTED
Thu Aug 28 21:35:53 CEST 2008


Loss of sharing happens not only in term_to_binary()
but also when passing the data in a message.

See e.g.

http://www.erlang.org/pipermail/erlang-bugs/2007-November/000488.html

http://www.erlang.org/pipermail/erlang-questions/2005-November/017924.html

We've also observed this when replicating call states
between processors. The copies take up more space than
the originals (not terribly much, since there wasn't
that much sharing to begin with.)

But as has been demonstrated, loss of sharing is a potential
killer in some cases, and there aren't always workarounds.

BR,
Ulf W


Joe Armstrong skrev:
> I got to thinking about records and structs, and this lead me to
> think about the behaviour of term_to_binary ..
> 
> term_to_binary has a misfeature that would cause problems in
> implementing dynamic records.
> 
> term_to_binary does not efficiently encode shared data
> structures. This is best illustrated by an example:
> 
> Consider this
> 
> -module(test3).
> -compile(export_all).
> 
> test() ->
>     Big = lists:duplicate(1000,a),
>     X = {Big},
>     Y = {Big,Big,Big,Big},
>     {sizeOf(Big),sizeOf(X), sizeOf(Y)}.
> 
> sizeOf(T) -> size(term_to_binary(T)).
> 
> 
> Look what happens when we run this:
> 
> 1> c(test3).
> {ok,test3}
> 2> test3:test().
> {4007,4009,16027}
> 
> The third number in this tuple surprises me.  I had expected it to be
> 12 bytes larger than 4009. Internally Y is a pointer to four words (an
> arity tag, with value 4), then 4 identical pointers. But the fact that
> sizeOf(Y) is four times sizeOf(X) means that shared sub-structures in
> Erlang terms do not become shared in the binary representation of the
> term.

...



More information about the erlang-questions mailing list