[erlang-questions] any way to speed up regex.split?

akonsu akonsu@REDACTED
Sat Dec 28 16:13:43 CET 2013


Thanks. I am trying to get all captures for my first group (\\w+) which is
inside the non-capturing group, and I am getting only the last capture. for
this specific example I need to get [<<"a">>,<<"b">>,<<"c">>]

konstantin


2013/12/28 Robert Virding <robert.virding@REDACTED>

> You are getting all the captures. By default the whole match is considered
> to be the first capture. If you don't want this then do:
>
> 1> re2:match(<<"a,b,c">>, <<"^(?:(\\w+)\\W+)*(\\w+)?$">>,
> [{capture,all_but_first,binary}]).
> {match,[<<"b">>,<<"c">>]}
>
> It mimics the re module in many ways.
>
> Robert
>
> ------------------------------
>
> *From: *"akonsu" <akonsu@REDACTED>
> *To: *"Anthony Molinaro" <anthonym@REDACTED>
> *Cc: *"Robert Virding" <robert.virding@REDACTED>,
> "erlang-questions" <erlang-questions@REDACTED>, "Steve Vinoski" <
> vinoski@REDACTED>
> *Sent: *Thursday, 26 December, 2013 6:42:19 PM
> *Subject: *Re: [erlang-questions] any way to speed up regex.split?
>
>
> I am trying to split by a regex, but I am getting the last two captures
> below, is there a way to get all captures?
>
> 1> re2:match(<<"a,b,c">>, <<"^(?:(\\w+)\\W+)*(\\w+)?$">>,
> [{capture,binary}]).
>
> {match,[<<"a,b,c">>,<<"b">>,<<"c">>]}
>
>
>
> 2013/12/23 Anthony Molinaro <anthonym@REDACTED>
>
>> Well there is
>>
>> https://github.com/tuncer/re2
>>
>> It is a NIF and works really well, we've had it in production for a
>> couple of years.
>>
>> -Anthony
>>
>> On Dec 23, 2013, at 6:07 AM, Robert Virding <
>> robert.virding@REDACTED> wrote:
>>
>> ------------------------------
>>
>> *From: *"Jesper Louis Andersen" <jesper.louis.andersen@REDACTED>
>>
>> On Sun, Dec 22, 2013 at 8:55 PM, Steve Vinoski <vinoski@REDACTED> wrote:
>>
>>> You can gain a slight speedup by specifying [{return,binary}] as the
>>> final argument to re:split/3, but since you're splitting on whitespace, why
>>> not use binary:split rather than re:split? The former appears to be 10x
>>> faster than the latter for this case.
>>
>>
>> This would be my approach as well. I tend to avoid regular expression
>> parsing if I can. The speed of the regex library is probably quite
>> dependent on the underlying regex engine. I would think the Ruby engine
>> (Onigumuru IIRC) is faster than the nice PCRE engine Erlang uses. There are
>> also the RE2 variant which uses a Thompson NFA and is faster for many
>> problems. But it has no direct Erlang-implementation.
>>
>>
>> It is faster and deterministic for any RE which needs backtracking; PCRE
>> can backtrack into oblivion. There should definitely be an re2 module. It
>> should be easier to implement as you don't have to worry about ensuring it
>> doesn't block too long.
>>
>> Robert
>>
>> _______________________________________________
>> erlang-questions mailing list
>> erlang-questions@REDACTED
>> http://erlang.org/mailman/listinfo/erlang-questions
>>
>>
>> _______________________________________________
>> erlang-questions mailing list
>> erlang-questions@REDACTED
>> http://erlang.org/mailman/listinfo/erlang-questions
>>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20131228/79e94db8/attachment.htm>


More information about the erlang-questions mailing list