[erlang-questions] any way to speed up regex.split?

Mon Dec 23 15:07:45 CET 2013

----- Original Message -----

> From: "Jesper Louis Andersen" <jesper.louis.andersen@REDACTED>

> On Sun, Dec 22, 2013 at 8:55 PM, Steve Vinoski < vinoski@REDACTED > wrote:

> > You can gain a slight speedup by specifying [{return,binary}] as the final
> > argument to re:split/3, but since you're splitting on whitespace, why not
> > use binary:split rather than re:split? The former appears to be 10x faster
> > than the latter for this case.
> 
> This would be my approach as well. I tend to avoid regular expression parsing
> if I can. The speed of the regex library is probably quite dependent on the
> underlying regex engine. I would think the Ruby engine (Onigumuru IIRC) is
> faster than the nice PCRE engine Erlang uses. There are also the RE2 variant
> which uses a Thompson NFA and is faster for many problems. But it has no
> direct Erlang-implementation.

It is faster and deterministic for any RE which needs backtracking; PCRE can backtrack into oblivion. There should definitely be an re2 module. It should be easier to implement as you don't have to worry about ensuring it doesn't block too long. 

Robert 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20131223/3ee59966/attachment.htm>