[erlang-questions] list vs binary performarnce, destructuring and consing
Dmitry Kolesnikov
dmkolesnikov@REDACTED
Tue Oct 23 17:34:23 CEST 2012
Hello,
I believe you should trade-off between memory utilization and performance.
Yes, lists are faster then binary but requires more memory.
I have figure out that binary parsing might be efficient if you scan it by changing index and capture tokens instead of single byte.
, e.g.:
parse(In, Pos, Len, Line, Fun, Acc0) ->
case In of
% end of field
<<_:Pos/binary, Tkn:Len/binary, ?FIELD_BY, _/binary>> ->
parse(In, Pos + Len + 1, 0, [Tkn | Line], Fun, Acc0);
...
_ ->
% no match increase token
parse(In, Pos, Len + 1, Line, Fun, Acc0)
end;
- Dmitry
On Oct 23, 2012, at 6:11 PM, Erik Pearson wrote:
> I used a simple thing like:
>
> test_iter(Mod, Fun, Args, Iters) ->
> test_iter(Mod, Fun, Args, now(), Iters, Iters).
>
> test_iter(_Mod, _Fun, _Args, Start, Iters, 0) ->
> Iters/(timer:now_diff(now(), Start)/1000000);
>
> test_iter(Mod, Fun, Args, Start, Iters, CountDown) ->
> erlang:apply(Mod, Fun, Args),
> test_iter(Mod, Fun, Args, Start, Iters, CountDown-1).
>
> And was just looking at total iterations per sec. I would just repeat this several times until I found a relatively stable reading. Sure, there was variation, but I'm looking to simulate the pattern I use in this library, which is iterating through and copying many small bits of text (json keys and values.)
>
> Since there was such a large difference in overall performance (string being more than twice as fast), I didn't feel the need to be more precise before posing the question.
>
> E.g.
>
> 20> json:test_iter(json, test_bin_copy, [<<"hi, my name is erik">>], 1000).
> 232018.5614849188
> 21> json:test_iter(json, test_bin_copy, [<<"hi, my name is erik">>], 1000).
> 224870.69934787496
> 22> json:test_iter(json, test_bin_copy, [<<"hi, my name is erik">>], 1000).
> 226193.16896629723
> 23>
> 23> json:test_iter(json, test_string_copy, ["hi, my name is erik"], 100000).
> 650732.3993154295
> 24> json:test_iter(json, test_string_copy, ["hi, my name is erik"], 100000).
> 608076.4716970806
> 25> json:test_iter(json, test_string_copy, ["hi, my name is erik"], 100000).
> 567359.7912115968
>
> Many of the follow up observations and questions have been stimulating, so I'm now interested as well in a more detailed analysis.
>
> However, in the end what I'm looking at is the differences in performance between list and binary string processing under what I believe is idiomatic Erlang for such problems.
>
> Thanks,
>
> Erik.
>
> On Tue, Oct 23, 2012 at 1:07 AM, Martynas Pumputis <martynasp@REDACTED> wrote:
> Hi,
>
> Could you show the exact steps of your simulation? Binary version should be faster, because some extra memory allocation is avoided per each iteration and large binaries aren't being copied.
>
> Take a look at: http://www.erlang.org/doc/efficiency_guide/binaryhandling.html
>
> Martynas
>
>
> On 10/23/2012 12:18 AM, Erik Pearson wrote:
> Hi,
> I've read from advice given many years ago that processing binaries byte
> by byte (e.g. a recursive parser), performance is better using a list to
> accumulate the bytes, rather than using binary concatenation. So
> [B|Accum] rather than <<Accum/binary, B>>. There seems to be
> a consensus however, on the efficiency of Binaries compared to List
> strings.
>
> My own quick test, which was just to copy a list or binary element by
> element, showed much better performance for the list version. The test
> was basically to pass an arbitrary string or binary, and copy it some
> number of thousands of times, and output the complete copies per second.
>
> I tried list based accumulation for a binary, using binary destructuring
> in the function head, and that sped things up, but it was still slower
> than the equivalent list string copy.
>
> Are there any tips for binaries? Of is this not a good use case for
> binaries.
>
> test_bin_copy(Bin) ->
> test_bin_copy(Bin, <<>>).
> test_bin_copy(<<>>, Accum) ->
> Accum;
> test_bin_copy(<<Char, Rest/binary>>, Accum) ->
> test_bin_copy(Rest, <<Accum/binary, Char>>).
>
> test_string_copy(Bin) ->
> test_string_copy(Bin, []).
> test_string_copy([], Accum) ->
> lists:reverse(Accum);
> test_string_copy([Char|Rest], Accum) ->
> test_string_copy(Rest, [Char|Accum]).
>
> For what its worth this is part of a json module. The current practice
> in json libraries seems to favor binaries, so I assumed there were
> inherent performance advantages. I can imagine, e.g., that an empty
> binary would be stored as a modest sized buffer that would be appended
> in place until there was a need to expand it or copy (e.g. if an older
> version of it was being appended), and that operations on it would be
> fast compared to arbitrary consing (which is however highly optimized.)
>
> I think some of the favoritism for binaries in json libs is because it
> makes it easy to differentiate json strings (as erlang binaries) from
> json arrays (as erlang lists), but my implementation is using tagged
> tuples to contain each json value, so this is not a concern. Of course
> there are the memory concerns, but in my tests any memory concerns with
> list char size vs binary bytes is erased by the performance gains.
>
> I'm sure I've put my foot in my mouth at least once, but, anyway, advice
> appreciated.
>
> Thanks,
> Erik.
>
>
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions
>
>
>
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20121023/679f69af/attachment.htm>
More information about the erlang-questions
mailing list