[erlang-questions] Regular grammars and binary matching performance
Robert Virding
robert.virding@REDACTED
Wed Oct 6 14:54:18 CEST 2010
Tuncer Ayaz has built an interface to the re2 library using NIFs, I
think it works on both lists and binaries. Implementing true POSIX
regular expressions in erlang is not that difficult, I have done one
using the same algorithms as in re2 which works on lists and binaries
(but not io_lists). I haven't measured it against re2 but I assume it is
slower, but it does have the benefit of being guaranteed portable to all
systems. Using capturing sub-expressions is costly though.
While I agree that these regexps would definitely be useful, and easier
to use because they are simpler, I don't think the interface should be
included in the binary module. It would be better to have a separate
module, for example re2 or pos_re, and have them work on both lists and
binaries. The extra runtime cost would be negligible.
One benefit of using re2, or something based on the same algorithms, is
that you avoid having backtracking as in PCRE/Perl which can go
pathological for some regular expressions.
Robert
On 2010-10-06 13.35, Hynek Vychodil wrote:
> Hello,
> I have just send new regexdna solution using R14's binary module:
> http://shootout.alioth.debian.org/u32q/performance.php?test=regexdna
> http://shootout.alioth.debian.org/u64q/performance.php?test=regexdna
> It shows how big gain we would have get by limited regular expression
> support in binary module. I have written limited regular expression
> support in function compile_pattern/1 (very limited and poor man
> solution!) as demonstration there
> http://shootout.alioth.debian.org/u32q/program.php?test=regexdna&lang=hipe&id=7
> for curiosity.
> I think it would be nice to have support for regular grammars in
> binary module or adapthttp://code.google.com/p/re2/ as limited
> regular expression library because those limited regular expression
> (truly regular grammars, without sub-capturing may be and so) could be
> feasible in real projects. For comparison, re module is 15 times
> slower in this benchmark
> http://shootout.alioth.debian.org/u32q/program.php?test=regexdna&lang=hipe&id=6
>
> Do you think it would be useful? Is anybody working on it already?
>
> Best regards
More information about the erlang-questions
mailing list