[erlang-questions] any way to speed up regex.split?

Robert Virding <>
Sat Dec 28 15:58:35 CET 2013


You are getting all the captures. By default the whole match is considered to be the first capture. If you don't want this then do: 

1> re2:match(<<"a,b,c">>, <<"^(?:(\\w+)\\W+)*(\\w+)?$">>, [{capture,all_but_first,binary}]). 
{match,[<<"b">>,<<"c">>]} 

It mimics the re module in many ways. 

Robert 

----- Original Message -----

> From: "akonsu" <>
> To: "Anthony Molinaro" <>
> Cc: "Robert Virding" <>,
> "erlang-questions" <>, "Steve Vinoski"
> <>
> Sent: Thursday, 26 December, 2013 6:42:19 PM
> Subject: Re: [erlang-questions] any way to speed up regex.split?

> I am trying to split by a regex, but I am getting the last two captures
> below, is there a way to get all captures?

> 1> re2:match(<<"a,b,c">>, <<"^(?:(\\w+)\\W+)*(\\w+)?$">>,
> [{capture,binary}]).

> {match,[<<"a,b,c">>,<<"b">>,<<"c">>]}

> 2013/12/23 Anthony Molinaro <  >

> > Well there is
> 

> > https://github.com/tuncer/re2
> 

> > It is a NIF and works really well, we've had it in production for a couple
> > of
> > years.
> 

> > -Anthony
> 

> > On Dec 23, 2013, at 6:07 AM, Robert Virding <
> >  > wrote:
> 

> > > > From: "Jesper Louis Andersen" <  >
> > > 
> > 
> 

> > > > On Sun, Dec 22, 2013 at 8:55 PM, Steve Vinoski <  >
> > > > wrote:
> > > 
> > 
> 

> > > > > You can gain a slight speedup by specifying [{return,binary}] as the
> > > > > final
> > > > > argument to re:split/3, but since you're splitting on whitespace, why
> > > > > not
> > > > > use binary:split rather than re:split? The former appears to be 10x
> > > > > faster
> > > > > than the latter for this case.
> > > > 
> > > 
> > 
> 
> > > > This would be my approach as well. I tend to avoid regular expression
> > > > parsing
> > > > if I can. The speed of the regex library is probably quite dependent on
> > > > the
> > > > underlying regex engine. I would think the Ruby engine (Onigumuru IIRC)
> > > > is
> > > > faster than the nice PCRE engine Erlang uses. There are also the RE2
> > > > variant
> > > > which uses a Thompson NFA and is faster for many problems. But it has
> > > > no
> > > > direct Erlang-implementation.
> > > 
> > 
> 

> > > It is faster and deterministic for any RE which needs backtracking; PCRE
> > > can
> > > backtrack into oblivion. There should definitely be an re2 module. It
> > > should
> > > be easier to implement as you don't have to worry about ensuring it
> > > doesn't
> > > block too long.
> > 
> 

> > > Robert
> > 
> 

> > > _______________________________________________
> > 
> 
> > > erlang-questions mailing list
> > 
> 
> > > 
> > 
> 
> > > http://erlang.org/mailman/listinfo/erlang-questions
> > 
> 

> > _______________________________________________
> 
> > erlang-questions mailing list
> 
> > 
> 
> > http://erlang.org/mailman/listinfo/erlang-questions
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20131228/eb246173/attachment.html>


More information about the erlang-questions mailing list