[erlang-questions] Trying to understand the performance impact of binary:split/3
Wed May 20 14:41:41 CEST 2015
Thanks Jesper, breaking the algorithm into steps really helps evaluating
where we can optimize.
In this particular case, matching is fast (I have measured independently)
and the generated list contains always 4 elements. That leaves us with
I wonder how fast it would be if binary:split/3 was written as a BIF.
Founder and Lead Developer
On Wed, May 20, 2015 at 2:28 PM, Jesper Louis Andersen <
> On Wed, May 20, 2015 at 1:14 PM, José Valim <
> jose.valim@REDACTED> wrote:
> The structure of the Needle and the Haystack matter:
> 1) In a search for Supercalifragilisticexpialidocious, search algorithms
> such as Boyer-Moore wins. GNU Grep even unrolls the inner BM loop to make
> it even faster.
> 2) If the needle is a $, separator in a CSV file, then naive string seach
> can do surprisingly well.
> 3) If you are searching for DNA sequence matches, the low alphabet of GACT
> means you have lots of partial matches. In that case the "standard"
> algorithm is Knuth-Morris-Pratt.
> running part is O(1), so constant factors will play an ever increasing
> role. Importantly in a call binary:part(Bin, Pos), the part/2 BIF can't
> know a priori that Pos is valid for the binary. So it has to run a bounds
> check, even though binary:matches/3 will always provide valid positions.
> Chances are that a low-level naive approach written in a statically typed
> language is able to eliminate those bounds checks because it knows stuff.
> Erlang can't fuse the construction of the MatchList with the reduction on
> the part call. So you build up a list of matches and then you consume that
> list right after. Compilers with a fuse-pass can eliminate these
> intermediate lists.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the erlang-questions