Hi,<div>I've read from advice given many years ago that processing binaries byte by byte (e.g. a recursive parser), performance is better using a list to accumulate the bytes, rather than using binary concatenation. So [B|Accum] rather than <<Accum/binary, B>>. There seems to be a consensus however, on the efficiency of Binaries compared to List strings.</div>
<div><br></div><div>My own quick test, which was just to copy a list or binary element by element, showed much better performance for the list version. The test was basically to pass an arbitrary string or binary, and copy it some number of thousands of times, and output the complete copies per second.</div>
<div><br></div><div>I tried list based accumulation for a binary, using binary destructuring in the function head, and that sped things up, but it was still slower than the equivalent list string copy.</div><div><br></div>
<div>Are there any tips for binaries? Of is this not a good use case for binaries. </div><div><br></div><div><div>test_bin_copy(Bin) -></div><div> test_bin_copy(Bin, <<>>).</div><div>test_bin_copy(<<>>, Accum) -></div>
<div> Accum;</div><div>test_bin_copy(<<Char, Rest/binary>>, Accum) -></div><div> test_bin_copy(Rest, <<Accum/binary, Char>>).</div><div><br></div><div>test_string_copy(Bin) -></div><div>
test_string_copy(Bin, []).</div><div>test_string_copy([], Accum) -></div><div> lists:reverse(Accum);</div><div>test_string_copy([Char|Rest], Accum) -></div><div> test_string_copy(Rest, [Char|Accum]).</div>
</div><div><br></div><div><div>For what its worth this is part of a json module. The current practice in json libraries seems to favor binaries, so I assumed there were inherent performance advantages. I can imagine, e.g., that an empty binary would be stored as a modest sized buffer that would be appended in place until there was a need to expand it or copy (e.g. if an older version of it was being appended), and that operations on it would be fast compared to arbitrary consing (which is however highly optimized.)</div>
</div><div><br></div><div>I think some of the favoritism for binaries in json libs is because it makes it easy to differentiate json strings (as erlang binaries) from json arrays (as erlang lists), but my implementation is using tagged tuples to contain each json value, so this is not a concern. Of course there are the memory concerns, but in my tests any memory concerns with list char size vs binary bytes is erased by the performance gains.</div>
<div><br></div><div>I'm sure I've put my foot in my mouth at least once, but, anyway, advice appreciated.</div><div><br></div><div>Thanks,</div><div>Erik.</div>