[erlang-questions] Regular Expressions Problems

Michael Santos michael.santos@REDACTED
Tue Apr 20 03:49:19 CEST 2010


On Mon, Apr 19, 2010 at 01:03:13PM +0100, Gordon Guthrie wrote:
> Folks
> 
> I think I may have identified a regular expression bug in re.
> 
> The following code never terminates in R13B-04:
> 
> -module(fail).
> 
> -export([fail/0]).
> 
> fail() ->
>       Str = "http:/www.flickr.com/slideShow/index.gne?group_id=&user_id=69845378@REDACTED",
>       EMail_regex = "[a-z0-9!#$%&'*+/=?^_`{|}~-]+"
>         ++ "(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*"
>         ++ "@(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+"
>         ++ "(?:[a-zA-Z]{2}|com|org|net|gov|mil"
>         ++ "|biz|info|mobi|name|aero|jobs|museum)",
>     io:format("about to run...~n"),
>     Ret = re:run(Str, EMail_regex),
>     io:format("Ret is ~p~n", [Ret]).
> 
> Eliminating the @ in either the string or the regex and it will
> terminate - but if you don't it wont...

$ pcretest
PCRE version 7.4 2007-09-21

  re> /[a-z0-9!#$%&'*+\/=?^_`{|}~-]+(?:.[a-z0-9!#$%&'*+\/=?^_`{|}~-]+)*@(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?.)+(?:[a-zA-Z]{2}|com|org|net|gov|mil|biz|info|mobi|name|aero|jobs|museum)/
  data> http:/www.flickr.com/slideShow/index.gne?group_id=&user_id=69845378@REDACTED
  Error -8

"-8" will happen if the match() call counter reaches some limit (by
default, 10000000). The comments in the header file explain that "the
limit exists in order to catch runaway regular expressions that take
for ever to determine that they do not match."

In Erlang, after the regexp matching has performed a number of operations,
it'll be swapped out. When the regexp matching is resumed, the match()
counter is zero'ed. I'm not sure why this is done but removing it at
least allows the match to return:

1> fail:fail().
about to run...
Ret is nomatch
ok


diff --git a/erts/emulator/pcre/pcre_exec.c b/erts/emulator/pcre/pcre_exec.c
index 5162513..3fe13ca 100644
--- a/erts/emulator/pcre/pcre_exec.c
+++ b/erts/emulator/pcre/pcre_exec.c
@@ -5191,7 +5191,6 @@ for(;;)
       EDEBUGF(("Loop limit break detected"));
       return PCRE_ERROR_LOOP_LIMIT;
   RESTART_INTERRUPTED:
-      md->match_call_count = 0; 
       md->loop_limit = extra_data->loop_limit;
       rc = match(NULL,NULL,NULL,0,md,0,NULL,0,0);
       *extra_data->loop_counter_return = 


More information about the erlang-questions mailing list