[erlang-questions] Performance question

Loïc Hoguin <>
Fri Nov 7 18:44:04 CET 2014


Based on the code at

https://github.com/erlang/otp/blob/maint/lib/stdlib/src/binary.erl#L268

It does a lot of splitting, and then a lot more splitting, and then call 
iolist_to_binary. It looks very inefficient.

Your solution is the fastest way to do it. You also benefit from match 
context optimization and so your code is very fast. The only thing that 
could make it faster is if memory was allocated only once for the 
resulting binary (instead of realloc a few times)... but maybe there's 
already an optimization like this?

On 11/07/2014 07:33 PM, Stu Bailey wrote:
> FYI,  if you want to try to replicate it, I was processing ~80 chunks of
> binary where each chunk was about ~250,000,000 bytes.  I think you'll
> see the difference on just one chunk.  I happen to running on a 8-core
> MacBook Pro with 16GB Ram and therefore spawned a process per chunk to
> grab all the resources on all the cores.   With the hand written
> function, it worked like a charm...yay Erlang! :-)  I love seeing a few
> lines of code effectively use all processing power available.  Heats the
> machine up quite a bit too. :-)
>
> On Fri, Nov 7, 2014 at 9:22 AM, Stu Bailey <
> <mailto:>> wrote:
>
>     I'm not planning to spend a lot of time on this right now, but the
>     binary:replace(...) was chewing a tremendous amount of system time
>     CPU load (and actually never finished before I got frustrated and
>     killed it) and my function was reporting the CPU load as 99% user
>     time (not system time) and finished in a reasonable time.   I assume
>     the high system time usage for binary:replace(..)  is because
>     binary:replace(...) is doing something manic with system calls for
>     memory management or something?
>
>
>     On Fri, Nov 7, 2014 at 1:44 AM, Loïc Hoguin <
>     <mailto:>> wrote:
>
>         binary:split and binary:replace, unlike other functions of the
>         binary module, are normal Erlang functions. They also process a
>         list of options before doing the actual work, so there's an
>         obvious overhead compared to not doing that. In addition as has
>         been pointed out, your code is more specialized so that helps too.
>
>         On 11/07/2014 03:33 AM, Stu Bailey wrote:
>
>             I found
>
>             binary:replace(BinChunk,<<"\n"__>>,<<>>,[global]).
>
>             /significantly /slower than
>
>             remove_pattern(BinChunk,<<>>,<__<"\n">>).
>
>             with
>
>             remove_pattern(<<>>,Acc,___BinPat) ->
>                   Acc;
>             remove_pattern(Bin,Acc,BinPat)__->
>                   <<Byte:1/binary,Rest/binary>> = Bin,
>                   case Byte == BinPat of
>             true -> remove_pattern(Rest,Acc,__BinPat);
>             false ->
>             remove_pattern(Rest,<<Acc/__binary,Byte/binary>>,BinPat)
>                   end.
>
>             That was surprising to me.  The built-in binary:replace()
>             was much much
>             slower for larger BinChunk with lots of <<"\n">> sprinkled
>             through.
>
>             Thoughts?
>
>
>             _________________________________________________
>             erlang-questions mailing list
>              <mailto:>
>             http://erlang.org/mailman/__listinfo/erlang-questions
>             <http://erlang.org/mailman/listinfo/erlang-questions>
>
>
>         --
>         Loïc Hoguin
>         http://ninenines.eu
>
>
>

-- 
Loïc Hoguin
http://ninenines.eu


More information about the erlang-questions mailing list