[erlang-questions] list vs binary performarnce, destructuring and consing

Erik Pearson erik@REDACTED
Tue Oct 23 17:11:03 CEST 2012


I used a simple thing like:

test_iter(Mod, Fun, Args, Iters) ->
    test_iter(Mod, Fun, Args, now(), Iters, Iters).

test_iter(_Mod, _Fun, _Args, Start, Iters, 0) ->
    Iters/(timer:now_diff(now(), Start)/1000000);

test_iter(Mod, Fun, Args, Start, Iters, CountDown) ->
    erlang:apply(Mod, Fun, Args),
    test_iter(Mod, Fun, Args, Start, Iters, CountDown-1).

And was just looking at total iterations per sec. I would just repeat this
several times until I found a relatively stable reading. Sure, there was
variation, but I'm looking to simulate the pattern I use in this library,
which is iterating through and copying many small bits of text (json keys
and values.)

Since there was such a large difference in overall performance (string
being more than twice as fast), I didn't feel the need to be more precise
before posing the question.

E.g.

20> json:test_iter(json, test_bin_copy, [<<"hi, my name is erik">>], 1000).
232018.5614849188
21> json:test_iter(json, test_bin_copy, [<<"hi, my name is erik">>], 1000).
224870.69934787496
22> json:test_iter(json, test_bin_copy, [<<"hi, my name is erik">>], 1000).
226193.16896629723
23>
23> json:test_iter(json, test_string_copy, ["hi, my name is erik"], 100000).
650732.3993154295
24> json:test_iter(json, test_string_copy, ["hi, my name is erik"], 100000).
608076.4716970806
25> json:test_iter(json, test_string_copy, ["hi, my name is erik"], 100000).
567359.7912115968

Many of the follow up observations and questions have been stimulating, so
I'm now interested as well in a more detailed analysis.

However, in the end what I'm looking at is the differences in performance
between list and binary string processing under what I believe is idiomatic
Erlang for such problems.

Thanks,

Erik.

On Tue, Oct 23, 2012 at 1:07 AM, Martynas Pumputis <martynasp@REDACTED>wrote:

> Hi,
>
> Could you show the exact steps of your simulation? Binary version should
> be faster, because some extra memory allocation is avoided per each
> iteration and large binaries aren't being copied.
>
> Take a look at: http://www.erlang.org/doc/**efficiency_guide/**
> binaryhandling.html<http://www.erlang.org/doc/efficiency_guide/binaryhandling.html>
>
> Martynas
>
>
> On 10/23/2012 12:18 AM, Erik Pearson wrote:
>
>> Hi,
>> I've read from advice given many years ago that processing binaries byte
>> by byte (e.g. a recursive parser), performance is better using a list to
>> accumulate the bytes, rather than using binary concatenation. So
>> [B|Accum] rather than <<Accum/binary, B>>. There seems to be
>> a consensus  however, on the efficiency of Binaries compared to List
>> strings.
>>
>> My own quick test, which was just to copy a list or binary element by
>> element, showed much better performance for the list version. The test
>> was basically to pass an arbitrary string or binary, and copy it some
>> number of thousands of times, and output the complete copies per second.
>>
>> I tried list based accumulation for a binary, using binary destructuring
>> in the function head, and that sped things up, but it was still slower
>> than the equivalent list string copy.
>>
>> Are there any tips for binaries? Of is this not a good use case for
>> binaries.
>>
>> test_bin_copy(Bin) ->
>>      test_bin_copy(Bin, <<>>).
>> test_bin_copy(<<>>, Accum) ->
>>      Accum;
>> test_bin_copy(<<Char, Rest/binary>>, Accum) ->
>>      test_bin_copy(Rest, <<Accum/binary, Char>>).
>>
>> test_string_copy(Bin) ->
>>      test_string_copy(Bin, []).
>> test_string_copy([], Accum) ->
>>      lists:reverse(Accum);
>> test_string_copy([Char|Rest], Accum) ->
>>      test_string_copy(Rest, [Char|Accum]).
>>
>> For what its worth this is part of a json module. The current practice
>> in json libraries seems to  favor binaries, so I assumed there were
>> inherent performance advantages. I can imagine, e.g., that an empty
>> binary would be stored as a modest sized buffer that would be appended
>> in place until there was a need to expand it or copy (e.g. if an older
>> version of it was being appended), and that operations on it would be
>> fast compared to arbitrary consing (which is however highly optimized.)
>>
>> I think some of the favoritism for binaries in json libs is because it
>> makes it easy to differentiate json strings (as erlang binaries) from
>> json arrays (as erlang lists), but my implementation is using tagged
>> tuples to contain each json value, so this is not a concern. Of course
>> there are the memory concerns, but in my tests any memory concerns with
>> list char size vs binary bytes is erased by the performance gains.
>>
>> I'm sure I've put my foot in my mouth at least once, but, anyway, advice
>> appreciated.
>>
>> Thanks,
>> Erik.
>>
>>
>> ______________________________**_________________
>> erlang-questions mailing list
>> erlang-questions@REDACTED
>> http://erlang.org/mailman/**listinfo/erlang-questions<http://erlang.org/mailman/listinfo/erlang-questions>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20121023/4b0d1e8e/attachment.htm>


More information about the erlang-questions mailing list