[erlang-questions] run strange behaviour

Erik Søe Sørensen eriksoe@REDACTED
Thu Oct 24 15:36:51 CEST 2013


Being greedy or not shouldn't change whether the regex matches or not.
I believe the issue is something else...:
It's bad practise to have repetitions of something that matches the empty
string - such as (\w* *)* - because that could be repeated any number of
times.
Indeed, the original regex runs pretty slowly:

8> timer:tc(re, run, [<<"foo bar is a foo bar is a big yellow boat or">>,
<<"^foo (\\w+(\\w* *)*) is a (\\w+(\\w* *)*)">>, [global, {capture, [1,3],
binary}]]).
{1052818,
 {match,[[<<"bar is a foo bar">>,<<"big yellow boat or">>]]}}

My guess, therefore, is that the regexp times out/reaches some run-duration
limit on the longer input string.

Fixing the regex to not have repetitions of something that matches "" helps
both the run time:

9> timer:tc(re, run, [<<"foo bar is a foo bar is a big yellow boat or">>,
<<"^foo (\\w+ *(\\w+ *)*) is a (\\w+ *(\\w+ *)*)">>, [global, {capture,
[1,3], binary}]]).
{14315,
 {match,[[<<"bar is a foo bar">>,<<"big yellow boat or">>]]}}
% 74x faster :-)

and the result for the longer input string:

10> timer:tc(re, run, [<<"foo bar is a foo bar is a big yellow boat or
sub">>, <<"^foo (\\w+ *(\\w+ *)*) is a (\\w+ *(\\w+ *)*)">>, [global,
{capture, [1,3], binary}]]).
{49911,
 {match,[[<<"bar is a foo bar">>,
          <<"big yellow boat or sub">>]]}}

/Erik


2013/10/24 Vyacheslav Levytskyy <v.levytskyy@REDACTED>

>  Hello,
>
> According to 're' module documentation, "the quantifiers are "greedy",
> that is, they match as much as possible (up to the maximum number of
> permitted times)". This seems to be a problem with your case. The regex you
> are using seems a bit problematic, forcing 're' to exhausting repetitions.
>
> As an option, you can use 'ungreedy' option, making only some of
> quantifiers greedy via following them by "?". See for example:
> re:run(<<"foo bar is a foo bar is a big yellow boat or sub">>, <<"^foo
> (\\w(\\w+| )*) is a (\\w(\\w+?| )*?)">>, [ungreedy, global, {capture,
> [1,3], binary}]).
> {match,[[<<"bar">>,
>          <<"foo bar is a big yellow boat or sub">>]]}
>
> Best regards,
> Vyacheslav Levytskyy
>
>
> On 23.10.2013 22:26, Alexander Petrovsky wrote:
>
> Hi!
>
>  I have the regex "^foo (\\w+(\\w* *)*) is an (\\w+(\\w* *)*)", and I get
> strange behaviour when I do:
>
>  1> re:run(<<"foo bar is a foo bar is a big yellow boat or">>, <<"^foo
> (\\w+(\\w* *)*) is a (\\w+(\\w* *)*)">>, [global, {capture, [1,3],
> binary}]).
> {match,[[<<"bar is a foo bar">>,<<"big yellow boat or">>]]}
>
>  2> re:run(<<"foo bar is a foo bar is a big yellow boat or sub">>,
> <<"^foo (\\w+(\\w* *)*) is a (\\w+(\\w* *)*)">>, [global, {capture, [1,3],
> binary}]).
> nomatch
>
>  I tested this regexp in clojure and python:
>
>  => (re-matches #"foo (\w+(\w* *)*) is a (\w+(\w* *)*)" "foo bar is a foo
> bar is a big yellow boat or")
> ["foo bar is a foo bar is a big yellow boat or" "bar is a foo bar" "" "big
> yellow boat or" ""]
>
>  => (re-matches #"foo (\w+(\w* *)*) is a (\w+(\w* *)*)" "foo bar is a foo
> bar is a big yellow boat or sub")
> ["foo bar is a foo bar is a big yellow boat or sub" "bar is a foo bar" ""
> "big yellow boat or sub" ""]
>
>  >>> import re
> >>> p = re.compile('foo (\w+(\w* *)*) is a (\w+(\w* *)*)')
> >>> p.match("foo bar is a foo bar is a big yellow boat or")
> <_sre.SRE_Match object at 0x100293c00>
> >>> p.match("foo bar is a foo bar is a big yellow boat or sub")
> <_sre.SRE_Match object at 0x100293ab0>
>
>  Can someone explain me, why I get on second string "foo bar is a foo bar
> is a big yellow boat or sub" nomatch? This is a bug?
>
>
>  --
> Петровский Александр / Alexander Petrovsky,
>
> Skype: askjuise
> Jabber: juise@REDACTED
> Phone: +7 914 8 820 815 (irkutsk)
>
>
>
> _______________________________________________
> erlang-questions mailing listerlang-questions@REDACTED://erlang.org/mailman/listinfo/erlang-questions
>
>
>
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20131024/6b4c1e56/attachment.htm>


More information about the erlang-questions mailing list