[erlang-questions] Performance question

Stu Bailey <>
Fri Nov 7 18:50:31 CET 2014


Thank you for the feedback.  That's very helpful.


On Fri, Nov 7, 2014 at 9:44 AM, Loïc Hoguin <> wrote:

> Based on the code at
>
> https://github.com/erlang/otp/blob/maint/lib/stdlib/src/binary.erl#L268
>
> It does a lot of splitting, and then a lot more splitting, and then call
> iolist_to_binary. It looks very inefficient.
>
> Your solution is the fastest way to do it. You also benefit from match
> context optimization and so your code is very fast. The only thing that
> could make it faster is if memory was allocated only once for the resulting
> binary (instead of realloc a few times)... but maybe there's already an
> optimization like this?
>
> On 11/07/2014 07:33 PM, Stu Bailey wrote:
>
>> FYI,  if you want to try to replicate it, I was processing ~80 chunks of
>> binary where each chunk was about ~250,000,000 bytes.  I think you'll
>> see the difference on just one chunk.  I happen to running on a 8-core
>> MacBook Pro with 16GB Ram and therefore spawned a process per chunk to
>> grab all the resources on all the cores.   With the hand written
>> function, it worked like a charm...yay Erlang! :-)  I love seeing a few
>> lines of code effectively use all processing power available.  Heats the
>> machine up quite a bit too. :-)
>>
>> On Fri, Nov 7, 2014 at 9:22 AM, Stu Bailey <
>> <mailto:>> wrote:
>>
>>     I'm not planning to spend a lot of time on this right now, but the
>>     binary:replace(...) was chewing a tremendous amount of system time
>>     CPU load (and actually never finished before I got frustrated and
>>     killed it) and my function was reporting the CPU load as 99% user
>>     time (not system time) and finished in a reasonable time.   I assume
>>     the high system time usage for binary:replace(..)  is because
>>     binary:replace(...) is doing something manic with system calls for
>>     memory management or something?
>>
>>
>>     On Fri, Nov 7, 2014 at 1:44 AM, Loïc Hoguin <
>>     <mailto:>> wrote:
>>
>>         binary:split and binary:replace, unlike other functions of the
>>         binary module, are normal Erlang functions. They also process a
>>         list of options before doing the actual work, so there's an
>>         obvious overhead compared to not doing that. In addition as has
>>         been pointed out, your code is more specialized so that helps too.
>>
>>         On 11/07/2014 03:33 AM, Stu Bailey wrote:
>>
>>             I found
>>
>>             binary:replace(BinChunk,<<"\n"__>>,<<>>,[global]).
>>
>>             /significantly /slower than
>>
>>             remove_pattern(BinChunk,<<>>,<__<"\n">>).
>>
>>             with
>>
>>             remove_pattern(<<>>,Acc,___BinPat) ->
>>                   Acc;
>>             remove_pattern(Bin,Acc,BinPat)__->
>>                   <<Byte:1/binary,Rest/binary>> = Bin,
>>                   case Byte == BinPat of
>>             true -> remove_pattern(Rest,Acc,__BinPat);
>>             false ->
>>             remove_pattern(Rest,<<Acc/__binary,Byte/binary>>,BinPat)
>>                   end.
>>
>>             That was surprising to me.  The built-in binary:replace()
>>             was much much
>>             slower for larger BinChunk with lots of <<"\n">> sprinkled
>>             through.
>>
>>             Thoughts?
>>
>>
>>             _________________________________________________
>>             erlang-questions mailing list
>>              <mailto:erlang-questions@
>> erlang.org>
>>             http://erlang.org/mailman/__listinfo/erlang-questions
>>             <http://erlang.org/mailman/listinfo/erlang-questions>
>>
>>
>>         --
>>         Loïc Hoguin
>>         http://ninenines.eu
>>
>>
>>
>>
> --
> Loïc Hoguin
> http://ninenines.eu
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20141107/dca2cbd0/attachment.html>


More information about the erlang-questions mailing list