[erlang-questions] any way to speed up regex.split?

Robert Virding robert.virding@REDACTED
Sat Dec 28 19:29:15 CET 2013


You only get one value per capture. 

Robert 

----- Original Message -----

> From: "akonsu" <akonsu@REDACTED>
> To: "Robert Virding" <robert.virding@REDACTED>
> Cc: "Anthony Molinaro" <anthonym@REDACTED>, "erlang-questions"
> <erlang-questions@REDACTED>, "Steve Vinoski" <vinoski@REDACTED>
> Sent: Saturday, 28 December, 2013 4:13:43 PM
> Subject: Re: [erlang-questions] any way to speed up regex.split?

> Thanks. I am trying to get all captures for my first group (\\w+) which is
> inside the non-capturing group, and I am getting only the last capture. for
> this specific example I need to get [<<"a">>,<<"b">>,<<"c">>]

> konstantin

> 2013/12/28 Robert Virding < robert.virding@REDACTED >

> > You are getting all the captures. By default the whole match is considered
> > to
> > be the first capture. If you don't want this then do:
> 

> > 1> re2:match(<<"a,b,c">>, <<"^(?:(\\w+)\\W+)*(\\w+)?$">>,
> > [{capture,all_but_first,binary}]).
> 
> > {match,[<<"b">>,<<"c">>]}
> 

> > It mimics the re module in many ways.
> 

> > Robert
> 

> > > From: "akonsu" < akonsu@REDACTED >
> > 
> 
> > > To: "Anthony Molinaro" < anthonym@REDACTED >
> > 
> 
> > > Cc: "Robert Virding" < robert.virding@REDACTED >,
> > > "erlang-questions" < erlang-questions@REDACTED >, "Steve Vinoski" <
> > > vinoski@REDACTED >
> > 
> 
> > > Sent: Thursday, 26 December, 2013 6:42:19 PM
> > 
> 
> > > Subject: Re: [erlang-questions] any way to speed up regex.split?
> > 
> 

> > > I am trying to split by a regex, but I am getting the last two captures
> > > below, is there a way to get all captures?
> > 
> 

> > > 1> re2:match(<<"a,b,c">>, <<"^(?:(\\w+)\\W+)*(\\w+)?$">>,
> > > [{capture,binary}]).
> > 
> 

> > > {match,[<<"a,b,c">>,<<"b">>,<<"c">>]}
> > 
> 

> > > 2013/12/23 Anthony Molinaro < anthonym@REDACTED >
> > 
> 

> > > > Well there is
> > > 
> > 
> 

> > > > https://github.com/tuncer/re2
> > > 
> > 
> 

> > > > It is a NIF and works really well, we've had it in production for a
> > > > couple
> > > > of
> > > > years.
> > > 
> > 
> 

> > > > -Anthony
> > > 
> > 
> 

> > > > On Dec 23, 2013, at 6:07 AM, Robert Virding <
> > > > robert.virding@REDACTED > wrote:
> > > 
> > 
> 

> > > > > > From: "Jesper Louis Andersen" < jesper.louis.andersen@REDACTED >
> > > > > 
> > > > 
> > > 
> > 
> 

> > > > > > On Sun, Dec 22, 2013 at 8:55 PM, Steve Vinoski < vinoski@REDACTED >
> > > > > > wrote:
> > > > > 
> > > > 
> > > 
> > 
> 

> > > > > > > You can gain a slight speedup by specifying [{return,binary}] as
> > > > > > > the
> > > > > > > final
> > > > > > > argument to re:split/3, but since you're splitting on whitespace,
> > > > > > > why
> > > > > > > not
> > > > > > > use binary:split rather than re:split? The former appears to be
> > > > > > > 10x
> > > > > > > faster
> > > > > > > than the latter for this case.
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > This would be my approach as well. I tend to avoid regular
> > > > > > expression
> > > > > > parsing
> > > > > > if I can. The speed of the regex library is probably quite
> > > > > > dependent
> > > > > > on
> > > > > > the
> > > > > > underlying regex engine. I would think the Ruby engine (Onigumuru
> > > > > > IIRC)
> > > > > > is
> > > > > > faster than the nice PCRE engine Erlang uses. There are also the
> > > > > > RE2
> > > > > > variant
> > > > > > which uses a Thompson NFA and is faster for many problems. But it
> > > > > > has
> > > > > > no
> > > > > > direct Erlang-implementation.
> > > > > 
> > > > 
> > > 
> > 
> 

> > > > > It is faster and deterministic for any RE which needs backtracking;
> > > > > PCRE
> > > > > can
> > > > > backtrack into oblivion. There should definitely be an re2 module. It
> > > > > should
> > > > > be easier to implement as you don't have to worry about ensuring it
> > > > > doesn't
> > > > > block too long.
> > > > 
> > > 
> > 
> 

> > > > > Robert
> > > > 
> > > 
> > 
> 

> > > > > _______________________________________________
> > > > 
> > > 
> > 
> 
> > > > > erlang-questions mailing list
> > > > 
> > > 
> > 
> 
> > > > > erlang-questions@REDACTED
> > > > 
> > > 
> > 
> 
> > > > > http://erlang.org/mailman/listinfo/erlang-questions
> > > > 
> > > 
> > 
> 

> > > > _______________________________________________
> > > 
> > 
> 
> > > > erlang-questions mailing list
> > > 
> > 
> 
> > > > erlang-questions@REDACTED
> > > 
> > 
> 
> > > > http://erlang.org/mailman/listinfo/erlang-questions
> > > 
> > 
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20131228/bfd1c70a/attachment.htm>


More information about the erlang-questions mailing list