[erlang-questions] Regular grammars and binary matching performance

Wed Oct 6 15:55:05 CEST 2010

On Wed, Oct 6, 2010 at 2:54 PM, Robert Virding
<robert.virding@REDACTED> wrote:
>  Tuncer Ayaz has built an interface to the re2 library using NIFs, I think
> it works on both lists and binaries. Implementing true POSIX regular
> expressions in erlang is not that difficult, I have done one using the same
> algorithms as in re2 which works on lists and binaries (but not io_lists). I
> haven't measured it against re2 but I assume it is slower, but it does have
> the benefit of being guaranteed portable to all systems. Using capturing
> sub-expressions is costly though.

I have been thinking about parse transformation and generation of code
of DFA before binary module introduced. I thought that with HiPE
compilation result should be competitive. But binary module changed
game. When R14A released I have written above proof of concept and
than haven't found time to try previous approach. I think you are
right that Erlang native implementation would be nice.

>
> While I agree that these regexps would definitely be useful, and easier to
> use because they are simpler, I don't think the interface should be included
> in the binary module. It would be better to have a separate module, for
> example re2 or pos_re, and have them work on both lists and binaries. The
> extra runtime cost would be negligible.
>
> One benefit of using re2, or something based on the same algorithms, is that
> you avoid having backtracking as in PCRE/Perl which can go pathological for
> some regular expressions.
>
> Robert
>
>
> On 2010-10-06 13.35, Hynek Vychodil wrote:
>>
>> Hello,
>> I have just send new regexdna solution using R14's binary module:
>> http://shootout.alioth.debian.org/u32q/performance.php?test=regexdna
>> http://shootout.alioth.debian.org/u64q/performance.php?test=regexdna
>> It shows how big gain we would have get by limited regular expression
>> support in binary module. I have written limited regular expression
>> support in function compile_pattern/1 (very limited and poor man
>> solution!) as demonstration there
>>
>> http://shootout.alioth.debian.org/u32q/program.php?test=regexdna&lang=hipe&id=7
>> for curiosity.
>> I think it would be nice to have support for regular grammars in
>> binary module or adapthttp://code.google.com/p/re2/  as limited
>> regular expression library because those limited regular expression
>> (truly regular grammars, without sub-capturing may be and so) could be
>> feasible in real projects. For comparison, re module is 15 times
>> slower in this benchmark
>>
>> http://shootout.alioth.debian.org/u32q/program.php?test=regexdna&lang=hipe&id=6
>>
>> Do you think it would be useful? Is anybody working on it already?
>>
>> Best regards
>

-- 
--Hynek (Pichi) Vychodil

Analyze your data in minutes. Share your insights instantly. Thrill
your boss.  Be a data hero!
Try GoodData now for free: www.gooddata.com