[erlang-questions] [enhancement] string:split/2

Mon Oct 13 12:14:04 CEST 2008

2008/10/13 Richard O'Keefe <ok@REDACTED>

>
> On 13 Oct 2008, at 9:22 am, Robert Virding wrote:
>
>  I suggest a function string:split/2 which splits the input string with the
>> separator and assumes that there is no separator and either end of the
>> string. So:
>>
>> string:split("ab:de:fg", ":") ==> ["ab","de","fg"]
>> string:split(":ab:de:fg:", ":") ==> [[],"ab","de","fg",[]]
>>
>
> This violates the assumption "that there is no separator [at]
> either end of the string".  To make that assumption is to
> assume that something that _looks_ like a separator at either
> end _isn't_ one, so
> string:split(":ab:de:fg:", ":") ==> [":ab","de","fg:"]
> under that assumption.
>
> I _think_ you are talking about the issue that came up
> last week with *multicharacter* separators:
>   string:split("a:::b", "::")
>   ==> ["a:","b"]
>   or  ["a",":b"]  -- the code I posted does this

That was me being extremely unclear. The examples I showed describe what I
mean, and not what I wrote. I think the function should search for the
separator from left-to-right as your second example. The question is how do
you describe in text what it returns when it finds the separator at the
beginning and at the end.

This also matches what will be in the re module (if they get it right) and
>> what is in the old regexp module.
>>
>
> No, it DOESN'T match it.  It CONFLICTS with regexp splitting.
> In fact, that's why it is useful!

regexp:split("ab:cd:ef", ":"). ==> {ok,["ab","cd","ef"]}
regexp:split(":ab:cd:ef:", ":"). ==> {ok,[[],"ab","cd","ef",[]]}
regexp:split(":::ab:::cd::ef", "::"). ==> {ok,[[],":ab",":cd","ef"]}

Or have I misunderstood you. What the new re:split will do I don't know,
hopefully the same thing.

Not to be different I have also included some code for a split/2. It is a
bit simpler than yours, at least I think so, but no guarantees for speed.
Although it would be easy to modify so that strip_sep/2 never returns but
directly calls strip/4.

 %% strip(String, Separator) -> [String].

strip(S, Sep) -> strip(S, Sep, [], []).

strip([H|T]=S0, Sep, R, Rs) ->
    case strip_sep(S0, Sep) of
        {yes,S1} -> strip(S1, Sep, [], [R|Rs]);
        no -> strip(T, Sep, [H|R], Rs)
    end;
strip([], _, R, Rs) -> strip_rev([R|Rs], []).

strip_rev([R|Rs], Acc) ->
    strip_rev(Rs, [reverse(R)|Acc]);
strip_rev([], Acc) -> Acc.

strip_sep([C|S], [C|Sep]) -> strip_sep(S, Sep);
strip_sep(S, []) -> {yes,S};
strip_sep(_, _) -> no.

Robert
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20081013/45892c5e/attachment.htm>