[erlang-questions] any way to speed up regex.split?

akonsu akonsu@REDACTED
Thu Dec 26 18:42:19 CET 2013


I am trying to split by a regex, but I am getting the last two captures
below, is there a way to get all captures?

1> re2:match(<<"a,b,c">>, <<"^(?:(\\w+)\\W+)*(\\w+)?$">>,
[{capture,binary}]).

{match,[<<"a,b,c">>,<<"b">>,<<"c">>]}



2013/12/23 Anthony Molinaro <anthonym@REDACTED>

> Well there is
>
> https://github.com/tuncer/re2
>
> It is a NIF and works really well, we've had it in production for a couple
> of years.
>
> -Anthony
>
> On Dec 23, 2013, at 6:07 AM, Robert Virding <
> robert.virding@REDACTED> wrote:
>
> ------------------------------
>
> *From: *"Jesper Louis Andersen" <jesper.louis.andersen@REDACTED>
>
> On Sun, Dec 22, 2013 at 8:55 PM, Steve Vinoski <vinoski@REDACTED> wrote:
>
>> You can gain a slight speedup by specifying [{return,binary}] as the
>> final argument to re:split/3, but since you're splitting on whitespace, why
>> not use binary:split rather than re:split? The former appears to be 10x
>> faster than the latter for this case.
>
>
> This would be my approach as well. I tend to avoid regular expression
> parsing if I can. The speed of the regex library is probably quite
> dependent on the underlying regex engine. I would think the Ruby engine
> (Onigumuru IIRC) is faster than the nice PCRE engine Erlang uses. There are
> also the RE2 variant which uses a Thompson NFA and is faster for many
> problems. But it has no direct Erlang-implementation.
>
>
> It is faster and deterministic for any RE which needs backtracking; PCRE
> can backtrack into oblivion. There should definitely be an re2 module. It
> should be easier to implement as you don't have to worry about ensuring it
> doesn't block too long.
>
> Robert
>
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions
>
>
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20131226/2c0db900/attachment.htm>


More information about the erlang-questions mailing list