[eeps] EEP 9

Mats Cronqvist masse@REDACTED
Thu Mar 6 09:52:44 CET 2008


Fredrik Svahn wrote:
>>  i'll backpedal a bit here. what i really think should be avoided is
>> backtracking. quoting from Cox (http://swtch.com/~rsc/regexp/regexp1.html);
>>  "Regular expression matching can be simple and fast, using finite
>> automata-based techniques that have been known for decades. In contrast,
>> Perl, PCRE, Python, Ruby, Java, and many other languages have regular
>> expression implementations based on recursive backtracking that are
>> simple but can be excruciatingly slow."
>>     
>
> This is indeed an excellent article, and I was quite concerned when I
> read it as well. After looking a bit further into PCRE I realised that
> PCRE seems to come with two alternative matching functions, an NFA
> based and a DFA based
>
> The NFA based clearly exhibits exponential behaviour as expected for
> the nasty search string used as an example in the article. It will
> eventually return an error -8 which means the backtracking limit has
> been reached:
>
> $ time ./pcredemo "(a?){19}a{19}" "aaaaaaaaaaaaaaaaaaa"
> Match succeeded at offset 0
> real 0m0.270s user 0m0.270s sys 0m0.000s
> $ time ./pcredemo "(a?){20}a{20}" "aaaaaaaaaaaaaaaaaaaa"
> Match succeeded at offset 0
> real 0m0.519s user 0m0.520s sys 0m0.000s
> $ time ./pcredemo "(a?){21}a{21}" "aaaaaaaaaaaaaaaaaaaaa"
> Match succeeded at offset 0
> real 0m1.039s user 0m1.030s sys 0m0.000s
> $ time ./pcredemo "(a?){22}a{22}" "aaaaaaaaaaaaaaaaaaaaaa"
> Matching error -8
> real 0m1.652s user 0m1.650s sys 0m0.000s
>
> The DFA based function, however, does not seem to exhibit any
> exponential behaviour:
>
> $ time ./pcredemo_dfa "(a?){21}a{21}" "aaaaaaaaaaaaaaaaaaaaa"
> Match succeeded at offset 0
> real 0m0.005s user 0m0.000s sys 0m0.000s
> $ time ./pcredemo_dfa "(a?){22}a{22}" "aaaaaaaaaaaaaaaaaaaaaa"
> Match succeeded at offset 0
> real 0m0.005s user 0m0.000s sys 0m0.000s
> $ time ./pcredemo_dfa "(a?){23}a{23}" "aaaaaaaaaaaaaaaaaaaaaaa"
> Match succeeded at offset 0
> real 0m0.006s user 0m0.010s sys 0m0.000s
> $ time ./pcredemo_dfa "(a?){30}a{30}" "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"
> Match succeeded at offset 0
> real 0m0.005s user 0m0.000s sys 0m0.010s
>
> Maybe there should be an option { dfa | nfa } making it possible to
> indicate which of the algorithms you want to use? The NFA one has more
> features according to the manual pages: http://www.pcre.org/pcre.txt

  upon further consideration; defaulting to DFA, with the option of 
dropping into NFA, would be enough to shut me up.

  mats



More information about the eeps mailing list