[erlang-questions] re:split not splitting at pipe symbol

Alain O'Dea <>
Sat Jan 8 21:37:26 CET 2011


It can be better to use string:tokens/2 over re:split/2 if you don't need to know the position of a token.  The problem is that string:tokens/2 drops empty tokens silently.

If you have a record like:
GivenName|MiddleName|Surname

And some records don't have middle names like:
Alain||O'Dea

Then string:tokens/2 will give you ["Alain","O'Dea"]

This may be okay, but in my experience it complicates code that needs the tokenization since it is hard to identify positional field of a delimited record.

On 2011-01-08, at 15:29, Andrey Pampukha <> wrote:

> I thought, string:tokens/2 is enough for such purpose :)
> ()1> string:tokens("123|456","|").
> ["123","456"]
> 
> Andrey.
> 
> 2011/1/8, Alain O'Dea <>:
>> On 2011-01-08, at 11:02, Dirk Scharff <> wrote:
>> 
>>> Hi all!
>>> 
>>> I'm experiencing a problem with re:split/2 which doesn't seem to be able
>>> to split at a pipe symbol "|". Is there a reason for this or is this a
>>> bug?
>>> 
>>> Reproduce as follows:
>>> 1> re:split("123|456","|").
>> 
>> Pipe is a meaningful symbol in regular expressions. It means 'or'.  You need
>> to escape the pipe:
>> 
>>    re:split("123|456","\\|").
>> 
>>> [<<"1">>,<<"2">>,<<"3">>,<<"|">>,<<"4">>,<<"5">>,<<"6">>,
>>> <<>>]
>>> 2> re:split("123|456",[124]).
>>> [<<"1">>,<<"2">>,<<"3">>,<<"|">>,<<"4">>,<<"5">>,<<"6">>,
>>> <<>>]
>>> 3> regexp:split("123|456","|").
>>> {ok,["123","456"]}
>>> 
>>> Regards,
>>> Dirk.
>>> ________________________________________________________________
>>> erlang-questions (at) erlang.org mailing list.
>>> See http://www.erlang.org/faq.html
>>> To unsubscribe; mailto:
>>> 
>> 
>> ________________________________________________________________
>> erlang-questions (at) erlang.org mailing list.
>> See http://www.erlang.org/faq.html
>> To unsubscribe; mailto:
>> 
>> 


More information about the erlang-questions mailing list