Regular Expressions Problems

Gordon Guthrie gordon@REDACTED
Mon Apr 19 14:03:13 CEST 2010


Folks

I think I may have identified a regular expression bug in re.

The following code never terminates in R13B-04:

-module(fail).

-export([fail/0]).

fail() ->
      Str = "http:/www.flickr.com/slideShow/index.gne?group_id=&user_id=69845378@REDACTED",
      EMail_regex = "[a-z0-9!#$%&'*+/=?^_`{|}~-]+"
        ++ "(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*"
        ++ "@(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+"
        ++ "(?:[a-zA-Z]{2}|com|org|net|gov|mil"
        ++ "|biz|info|mobi|name|aero|jobs|museum)",
    io:format("about to run...~n"),
    Ret = re:run(Str, EMail_regex),
    io:format("Ret is ~p~n", [Ret]).

Eliminating the @ in either the string or the regex and it will
terminate - but if you don't it wont...

There is a comment about the behaviour of '@' in Perl regular
expressions in the docos:

> If you want to remove the special meaning from a sequence of characters, you can do so by putting them between \Q and \E.
> This is different from Perl in that $ and @ are handled as literals in \Q...\E sequences in PCRE, whereas in Perl, $ and @ cause variable interpolation.'

Jeremy Zawinski's famous comment springs to mind:

> Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems.

Gordon


More information about the erlang-questions mailing list