[erlang-questions] Trying to understand the performance impact of binary:split/3
Wed May 20 12:56:02 CEST 2015
binary:split is not fast and unfortunately many people do not realize that.
If you want speed, here is an implementation that is made for speed:
On Wed, May 20, 2015 at 12:35 PM, José Valim <
> Hello folks,
> At the beginning of the month, someone wrote a blog post comparing data
> processing between different platforms and languages, one of them being
> Erlang VM/Elixir:
> After running the experiments, I thought we could do much better. To my
> surprise, our biggest performance hit was when calling binary:split/3. I
> have rewritten the code to use only Erlang function calls (to make it
> clearer for this discussion):
> The performance in both Erlang and Elixir variants are the same (rewritten
> in Erlang is also the same result). This line is the bottleneck:
> In fact, if we move the regular expression check to before the
> binary:split/3 call, we get the same performance as Go in my machine.
> Meaning that binary:split/3 is making the code at least twice slower.
> The binary:split/3 implementation is broken in two pieces: first we find
> all matches via binary:matches/3 and then we traverse the matches
> converting them to binaries with binary:part/3. The binary:part/3 call is
> the slow piece here.
> *My question is:* is this expected? Why binary:split/3 (and
> binary:part/3) is affecting performance so drastically? How can I
> investigate/understand this further?
> ## Other bottlenecks
> The other two immediate bottlenecks are the use of regular expressions and
> the use of file:read_line/3 instead of loading the whole file into memory.
> Those were given as hard requirements by the author. None the less, someone
> wrote an Erlang implementation that removes those bottlenecks too (along
> binary:split/3) and the performance is outstanding:
> I have since then rewritten the Elixir one and got a similar result.
> However I am still puzzled because using binary:split/3 would have been my
> first try (instead of relying on match+part) as it leads to cleaner code
> *José Valim*
> Skype: jv.ptec
> Founder and Lead Developer
> erlang-questions mailing list
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the erlang-questions