[erlang-questions] list vs binary performarnce, destructuring and consing

Tue Oct 23 00:18:48 CEST 2012

Hi,
I've read from advice given many years ago that processing binaries byte by
byte (e.g. a recursive parser), performance is better using a list to
accumulate the bytes, rather than using binary concatenation. So [B|Accum]
rather than <<Accum/binary, B>>. There seems to be a consensus  however, on
the efficiency of Binaries compared to List strings.

My own quick test, which was just to copy a list or binary element by
element, showed much better performance for the list version. The test was
basically to pass an arbitrary string or binary, and copy it some number of
thousands of times, and output the complete copies per second.

I tried list based accumulation for a binary, using binary destructuring in
the function head, and that sped things up, but it was still slower than
the equivalent list string copy.

Are there any tips for binaries? Of is this not a good use case for
binaries.

test_bin_copy(Bin) ->
    test_bin_copy(Bin, <<>>).
test_bin_copy(<<>>, Accum) ->
    Accum;
test_bin_copy(<<Char, Rest/binary>>, Accum) ->
    test_bin_copy(Rest, <<Accum/binary, Char>>).

test_string_copy(Bin) ->
    test_string_copy(Bin, []).
test_string_copy([], Accum) ->
    lists:reverse(Accum);
test_string_copy([Char|Rest], Accum) ->
    test_string_copy(Rest, [Char|Accum]).

For what its worth this is part of a json module. The current practice in
json libraries seems to  favor binaries, so I assumed there were inherent
performance advantages. I can imagine, e.g., that an empty binary would be
stored as a modest sized buffer that would be appended in place until there
was a need to expand it or copy (e.g. if an older version of it was being
appended), and that operations on it would be fast compared to arbitrary
consing (which is however highly optimized.)

I think some of the favoritism for binaries in json libs is because it
makes it easy to differentiate json strings (as erlang binaries) from json
arrays (as erlang lists), but my implementation is using tagged tuples to
contain each json value, so this is not a concern. Of course there are the
memory concerns, but in my tests any memory concerns with list char size vs
binary bytes is erased by the performance gains.

I'm sure I've put my foot in my mouth at least once, but, anyway, advice
appreciated.

Thanks,
Erik.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20121022/a1803512/attachment.htm>