[erlang-questions] splitting binaries on a separator

Robert Virding rvirding@REDACTED
Fri May 29 01:15:15 CEST 2009


2009/5/28 David Mercer <dmercer@REDACTED>

> > Couldn't you just first step down the binary until you find the index of
> > the
> > separator and then do a split_binary/2 on the original? This would save
> > incrementally building the leading binary. Or have I missed something?
>
> I don't think he is building the binary incrementally.  I believe the two
> parts are built by the instructions:
>
> > >    case Bin of
> > >        <<Bin1:N/binary, Sep:SepSize/binary>> ->
> > >            {ok, Bin1, <<>>};
> > >        <<Bin1:N/binary, Sep:SepSize/binary, Rest/binary>> ->
> > >            {ok, Bin1, Rest};
>

This does not explicitly build it but the matching operation may very well
do it before it realises that the separator is not following. I don't know
if the binary matching delays creating until the match has succeeded.

If we assume SepSize will usually be 1, then his algorithm looks OK (there
> may be some micro-optimizations).  If SepSize >> 1, however, a better
> searching algorithm may be beneficial (e.g., Rabin-Karp,
> Knuth-Morris-Pratt).  (I'm just name-dropping those - I am not an expert in
> string-searching.)


There are undoubtedly better search algorithms, I know of them but I don't
know them. Boyer-Moore used to be good. :-)

Robert


More information about the erlang-questions mailing list