<div dir="ltr"><div class="gmail_extra"><br><div class="gmail_quote">On Wed, May 20, 2015 at 1:14 PM, José Valim <span dir="ltr"><<a href="mailto:jose.valim@plataformatec.com.br" target="_blank">jose.valim@plataformatec.com.br</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">Thoughts?</blockquote></div><br>Matching:</div><div class="gmail_extra"><br></div><div class="gmail_extra">The structure of the Needle and the Haystack matter:</div><div class="gmail_extra"><br></div><div class="gmail_extra">1) In a search for Supercalifragilisticexpialidocious, search algorithms such as Boyer-Moore wins. GNU Grep even unrolls the inner BM loop to make it even faster.</div><div class="gmail_extra">2) If the needle is a $, separator in a CSV file, then naive string seach can do surprisingly well.</div><div class="gmail_extra">3) If you are searching for DNA sequence matches, the low alphabet of GACT means you have lots of partial matches. In that case the "standard" algorithm is Knuth-Morris-Pratt.</div><div class="gmail_extra"><br></div><div class="gmail_extra">Part'ing:</div><div class="gmail_extra"><br></div><div class="gmail_extra">running part is O(1), so constant factors will play an ever increasing role. Importantly in a call binary:part(Bin, Pos), the part/2 BIF can't know a priori that Pos is valid for the binary. So it has to run a bounds check, even though binary:matches/3 will always provide valid positions. Chances are that a low-level naive approach written in a statically typed language is able to eliminate those bounds checks because it knows stuff.</div><div class="gmail_extra"><br></div><div class="gmail_extra">Fusion:</div><div class="gmail_extra"><br></div><div class="gmail_extra">Erlang can't fuse the construction of the MatchList with the reduction on the part call. So you build up a list of matches and then you consume that list right after. Compilers with a fuse-pass can eliminate these intermediate lists.<br><br clear="all"><div><br></div>-- <br><div class="gmail_signature">J.</div>
</div></div>