[erlang-questions] list vs binary performarnce, destructuring and consing

Tue Oct 23 17:34:23 CEST 2012

Hello,

I believe you should trade-off between memory utilization and performance.
Yes, lists are faster then binary but requires more memory.

I have figure out that binary parsing might be efficient if you scan it by changing index and capture tokens instead of single byte.
, e.g.:

parse(In, Pos, Len, Line, Fun, Acc0) ->
case In of
   % end of field
   <<_:Pos/binary, Tkn:Len/binary, ?FIELD_BY,  _/binary>> ->
         parse(In, Pos + Len + 1, 0, [Tkn | Line], Fun, Acc0);
   ...
   _ ->
      % no match increase token
      parse(In, Pos, Len + 1, Line, Fun, Acc0)
end;

- Dmitry

On Oct 23, 2012, at 6:11 PM, Erik Pearson wrote:

> I used a simple thing like:
> 
> test_iter(Mod, Fun, Args, Iters) ->
>     test_iter(Mod, Fun, Args, now(), Iters, Iters).
> 
> test_iter(_Mod, _Fun, _Args, Start, Iters, 0) ->
>     Iters/(timer:now_diff(now(), Start)/1000000);
> 
> test_iter(Mod, Fun, Args, Start, Iters, CountDown) ->
>     erlang:apply(Mod, Fun, Args),
>     test_iter(Mod, Fun, Args, Start, Iters, CountDown-1).
> 
> And was just looking at total iterations per sec. I would just repeat this several times until I found a relatively stable reading. Sure, there was variation, but I'm looking to simulate the pattern I use in this library, which is iterating through and copying many small bits of text (json keys and values.)
> 
> Since there was such a large difference in overall performance (string being more than twice as fast), I didn't feel the need to be more precise before posing the question.
> 
> E.g.
> 
> 20> json:test_iter(json, test_bin_copy, [<<"hi, my name is erik">>], 1000).
> 232018.5614849188
> 21> json:test_iter(json, test_bin_copy, [<<"hi, my name is erik">>], 1000).
> 224870.69934787496
> 22> json:test_iter(json, test_bin_copy, [<<"hi, my name is erik">>], 1000).
> 226193.16896629723
> 23> 
> 23> json:test_iter(json, test_string_copy, ["hi, my name is erik"], 100000).
> 650732.3993154295
> 24> json:test_iter(json, test_string_copy, ["hi, my name is erik"], 100000).
> 608076.4716970806
> 25> json:test_iter(json, test_string_copy, ["hi, my name is erik"], 100000).
> 567359.7912115968
> 
> Many of the follow up observations and questions have been stimulating, so I'm now interested as well in a more detailed analysis.
> 
> However, in the end what I'm looking at is the differences in performance between list and binary string processing under what I believe is idiomatic Erlang for such problems.
> 
> Thanks,
> 
> Erik.
> 
> On Tue, Oct 23, 2012 at 1:07 AM, Martynas Pumputis <martynasp@REDACTED> wrote:
> Hi,
> 
> Could you show the exact steps of your simulation? Binary version should be faster, because some extra memory allocation is avoided per each iteration and large binaries aren't being copied.
> 
> Take a look at: http://www.erlang.org/doc/efficiency_guide/binaryhandling.html
> 
> Martynas
> 
> 
> On 10/23/2012 12:18 AM, Erik Pearson wrote:
> Hi,
> I've read from advice given many years ago that processing binaries byte
> by byte (e.g. a recursive parser), performance is better using a list to
> accumulate the bytes, rather than using binary concatenation. So
> [B|Accum] rather than <<Accum/binary, B>>. There seems to be
> a consensus  however, on the efficiency of Binaries compared to List
> strings.
> 
> My own quick test, which was just to copy a list or binary element by
> element, showed much better performance for the list version. The test
> was basically to pass an arbitrary string or binary, and copy it some
> number of thousands of times, and output the complete copies per second.
> 
> I tried list based accumulation for a binary, using binary destructuring
> in the function head, and that sped things up, but it was still slower
> than the equivalent list string copy.
> 
> Are there any tips for binaries? Of is this not a good use case for
> binaries.
> 
> test_bin_copy(Bin) ->
>      test_bin_copy(Bin, <<>>).
> test_bin_copy(<<>>, Accum) ->
>      Accum;
> test_bin_copy(<<Char, Rest/binary>>, Accum) ->
>      test_bin_copy(Rest, <<Accum/binary, Char>>).
> 
> test_string_copy(Bin) ->
>      test_string_copy(Bin, []).
> test_string_copy([], Accum) ->
>      lists:reverse(Accum);
> test_string_copy([Char|Rest], Accum) ->
>      test_string_copy(Rest, [Char|Accum]).
> 
> For what its worth this is part of a json module. The current practice
> in json libraries seems to  favor binaries, so I assumed there were
> inherent performance advantages. I can imagine, e.g., that an empty
> binary would be stored as a modest sized buffer that would be appended
> in place until there was a need to expand it or copy (e.g. if an older
> version of it was being appended), and that operations on it would be
> fast compared to arbitrary consing (which is however highly optimized.)
> 
> I think some of the favoritism for binaries in json libs is because it
> makes it easy to differentiate json strings (as erlang binaries) from
> json arrays (as erlang lists), but my implementation is using tagged
> tuples to contain each json value, so this is not a concern. Of course
> there are the memory concerns, but in my tests any memory concerns with
> list char size vs binary bytes is erased by the performance gains.
> 
> I'm sure I've put my foot in my mouth at least once, but, anyway, advice
> appreciated.
> 
> Thanks,
> Erik.
> 
> 
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions
> 
> 
> 
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20121023/679f69af/attachment.htm>