[erlang-questions] splitting binaries on a separator

David Mercer dmercer@REDACTED
Thu May 28 19:01:52 CEST 2009


> Couldn't you just first step down the binary until you find the index of
> the
> separator and then do a split_binary/2 on the original? This would save
> incrementally building the leading binary. Or have I missed something?

I don't think he is building the binary incrementally.  I believe the two
parts are built by the instructions:

> >    case Bin of
> >        <<Bin1:N/binary, Sep:SepSize/binary>> ->
> >            {ok, Bin1, <<>>};
> >        <<Bin1:N/binary, Sep:SepSize/binary, Rest/binary>> ->
> >            {ok, Bin1, Rest};

N increases with each call:

> >        _ ->
> >            split_bin(1, Sep, SepSize, Bin)

So if Sep is in the Bin, it will eventually be found, and then Bin1 is the
portion before the Sep.

If we assume SepSize will usually be 1, then his algorithm looks OK (there
may be some micro-optimizations).  If SepSize >> 1, however, a better
searching algorithm may be beneficial (e.g., Rabin-Karp,
Knuth-Morris-Pratt).  (I'm just name-dropping those - I am not an expert in
string-searching.)

David Mercer


> -----Original Message-----
> From: erlang-questions@REDACTED [mailto:erlang-questions@REDACTED] On
> Behalf Of Robert Virding
> Sent: Thursday, May 28, 2009 11:10 AM
> To: Joel Reymont
> Cc: erlang-questions@REDACTED
> Subject: Re: [erlang-questions] splitting binaries on a separator
> 
> Couldn't you just first step down the binary until you find the index of
> the
> separator and then do a split_binary/2 on the original? This would save
> incrementally building the leading binary. Or have I missed something?
> 
> Robert
> 
> 2009/5/28 Joel Reymont <joelr1@REDACTED>
> 
> > Is there a more efficient way of splitting binaries on a separator
> > sequence?
> >
> >        Thanks, Joel
> >
> > ---
> >
> > split_bin(Sep, Bin) ->
> >    split_bin(Sep, byte_size(Sep), Bin).
> >
> > split_bin(Sep, SepSize, Bin) ->
> >    case Bin of
> >        <<Sep:SepSize/binary, Rest/binary>> ->
> >            {ok, <<>>, Rest};
> >        _ ->
> >            split_bin(1, Sep, SepSize, Bin)
> >    end.
> >
> > split_bin(N, Sep, SepSize, Bin)
> >  when N < byte_size(Bin) ->
> >    case Bin of
> >        <<Bin1:N/binary, Sep:SepSize/binary>> ->
> >            {ok, Bin1, <<>>};
> >        <<Bin1:N/binary, Sep:SepSize/binary, Rest/binary>> ->
> >            {ok, Bin1, Rest};
> >        _ ->
> >            split_bin(N + 1, Sep, SepSize, Bin)
> >    end;
> >
> > split_bin(_, _, _, Bin) ->
> >    {more, Bin}.
> >
> > ---
> > Mac hacker with a performance bent
> > http://www.linkedin.com/in/joelreymont
> >
> >
> > ________________________________________________________________
> > erlang-questions mailing list. See http://www.erlang.org/faq.html
> > erlang-questions (at) erlang.org
> >
> >



More information about the erlang-questions mailing list