[erlang-questions] any way to speed up regex.split?

Anthony Molinaro anthonym@REDACTED
Mon Dec 23 16:35:22 CET 2013


Well there is

https://github.com/tuncer/re2

It is a NIF and works really well, we've had it in production for a couple of years.

-Anthony

On Dec 23, 2013, at 6:07 AM, Robert Virding <robert.virding@REDACTED> wrote:

> 
> From: "Jesper Louis Andersen" <jesper.louis.andersen@REDACTED>
> 
> On Sun, Dec 22, 2013 at 8:55 PM, Steve Vinoski <vinoski@REDACTED> wrote:
>> You can gain a slight speedup by specifying [{return,binary}] as the final argument to re:split/3, but since you're splitting on whitespace, why not use binary:split rather than re:split? The former appears to be 10x faster than the latter for this case.
> 
> This would be my approach as well. I tend to avoid regular expression parsing if I can. The speed of the regex library is probably quite dependent on the underlying regex engine. I would think the Ruby engine (Onigumuru IIRC) is faster than the nice PCRE engine Erlang uses. There are also the RE2 variant which uses a Thompson NFA and is faster for many problems. But it has no direct Erlang-implementation.
> 
> It is faster and deterministic for any RE which needs backtracking; PCRE can backtrack into oblivion. There should definitely be an re2 module. It should be easier to implement as you don't have to worry about ensuring it doesn't block too long.
> 
> Robert
> 
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20131223/a48e7dd3/attachment.htm>


More information about the erlang-questions mailing list