[erlang-questions] Regular grammars and binary matching performance

Wed Oct 6 14:54:18 CEST 2010

  Tuncer Ayaz has built an interface to the re2 library using NIFs, I 
think it works on both lists and binaries. Implementing true POSIX 
regular expressions in erlang is not that difficult, I have done one 
using the same algorithms as in re2 which works on lists and binaries 
(but not io_lists). I haven't measured it against re2 but I assume it is 
slower, but it does have the benefit of being guaranteed portable to all 
systems. Using capturing sub-expressions is costly though.

While I agree that these regexps would definitely be useful, and easier 
to use because they are simpler, I don't think the interface should be 
included in the binary module. It would be better to have a separate 
module, for example re2 or pos_re, and have them work on both lists and 
binaries. The extra runtime cost would be negligible.

One benefit of using re2, or something based on the same algorithms, is 
that you avoid having backtracking as in PCRE/Perl which can go 
pathological for some regular expressions.

Robert

On 2010-10-06 13.35, Hynek Vychodil wrote:
> Hello,
> I have just send new regexdna solution using R14's binary module:
> http://shootout.alioth.debian.org/u32q/performance.php?test=regexdna
> http://shootout.alioth.debian.org/u64q/performance.php?test=regexdna
> It shows how big gain we would have get by limited regular expression
> support in binary module. I have written limited regular expression
> support in function compile_pattern/1 (very limited and poor man
> solution!) as demonstration there
> http://shootout.alioth.debian.org/u32q/program.php?test=regexdna&lang=hipe&id=7
> for curiosity.
> I think it would be nice to have support for regular grammars in
> binary module or adapthttp://code.google.com/p/re2/  as limited
> regular expression library because those limited regular expression
> (truly regular grammars, without sub-capturing may be and so) could be
> feasible in real projects. For comparison, re module is 15 times
> slower in this benchmark
> http://shootout.alioth.debian.org/u32q/program.php?test=regexdna&lang=hipe&id=6
>
> Do you think it would be useful? Is anybody working on it already?
>
> Best regards