[erlang-questions] splitting binaries on a separator

Kenneth Lundin kenneth.lundin@REDACTED
Thu May 28 19:18:42 CEST 2009


Why not use the re:split/2 or re:split/3 functions like this:

Eshell V5.7.1  (abort with ^G)
1> re:split(<<"Erlang">>,"l").
[<<"Er">>,<<"ang">>]

The second argument is a regular expression defining the character
sequence to split at.

re is mostly using BIF's written in C so it will be fast. And it is
convenient too (using the split function).

/Kenneth, Erlang/OTP Ericsson

On Thu, May 28, 2009 at 7:01 PM, David Mercer <dmercer@REDACTED> wrote:
>> Couldn't you just first step down the binary until you find the index of
>> the
>> separator and then do a split_binary/2 on the original? This would save
>> incrementally building the leading binary. Or have I missed something?
>
> I don't think he is building the binary incrementally.  I believe the two
> parts are built by the instructions:
>
>> >    case Bin of
>> >        <<Bin1:N/binary, Sep:SepSize/binary>> ->
>> >            {ok, Bin1, <<>>};
>> >        <<Bin1:N/binary, Sep:SepSize/binary, Rest/binary>> ->
>> >            {ok, Bin1, Rest};
>
> N increases with each call:
>
>> >        _ ->
>> >            split_bin(1, Sep, SepSize, Bin)
>
> So if Sep is in the Bin, it will eventually be found, and then Bin1 is the
> portion before the Sep.
>
> If we assume SepSize will usually be 1, then his algorithm looks OK (there
> may be some micro-optimizations).  If SepSize >> 1, however, a better
> searching algorithm may be beneficial (e.g., Rabin-Karp,
> Knuth-Morris-Pratt).  (I'm just name-dropping those - I am not an expert in
> string-searching.)
>
> David Mercer
>
>
>> -----Original Message-----
>> From: erlang-questions@REDACTED [mailto:erlang-questions@REDACTED] On
>> Behalf Of Robert Virding
>> Sent: Thursday, May 28, 2009 11:10 AM
>> To: Joel Reymont
>> Cc: erlang-questions@REDACTED
>> Subject: Re: [erlang-questions] splitting binaries on a separator
>>
>> Couldn't you just first step down the binary until you find the index of
>> the
>> separator and then do a split_binary/2 on the original? This would save
>> incrementally building the leading binary. Or have I missed something?
>>
>> Robert
>>
>> 2009/5/28 Joel Reymont <joelr1@REDACTED>
>>
>> > Is there a more efficient way of splitting binaries on a separator
>> > sequence?
>> >
>> >        Thanks, Joel
>> >
>> > ---
>> >
>> > split_bin(Sep, Bin) ->
>> >    split_bin(Sep, byte_size(Sep), Bin).
>> >
>> > split_bin(Sep, SepSize, Bin) ->
>> >    case Bin of
>> >        <<Sep:SepSize/binary, Rest/binary>> ->
>> >            {ok, <<>>, Rest};
>> >        _ ->
>> >            split_bin(1, Sep, SepSize, Bin)
>> >    end.
>> >
>> > split_bin(N, Sep, SepSize, Bin)
>> >  when N < byte_size(Bin) ->
>> >    case Bin of
>> >        <<Bin1:N/binary, Sep:SepSize/binary>> ->
>> >            {ok, Bin1, <<>>};
>> >        <<Bin1:N/binary, Sep:SepSize/binary, Rest/binary>> ->
>> >            {ok, Bin1, Rest};
>> >        _ ->
>> >            split_bin(N + 1, Sep, SepSize, Bin)
>> >    end;
>> >
>> > split_bin(_, _, _, Bin) ->
>> >    {more, Bin}.
>> >
>> > ---
>> > Mac hacker with a performance bent
>> > http://www.linkedin.com/in/joelreymont
>> >
>> >
>> > ________________________________________________________________
>> > erlang-questions mailing list. See http://www.erlang.org/faq.html
>> > erlang-questions (at) erlang.org
>> >
>> >
>
>
> ________________________________________________________________
> erlang-questions mailing list. See http://www.erlang.org/faq.html
> erlang-questions (at) erlang.org
>
>


More information about the erlang-questions mailing list