[erlang-questions] run strange behaviour
Erik Søe Sørensen
eriksoe@REDACTED
Thu Oct 24 15:36:51 CEST 2013
Being greedy or not shouldn't change whether the regex matches or not.
I believe the issue is something else...:
It's bad practise to have repetitions of something that matches the empty
string - such as (\w* *)* - because that could be repeated any number of
times.
Indeed, the original regex runs pretty slowly:
8> timer:tc(re, run, [<<"foo bar is a foo bar is a big yellow boat or">>,
<<"^foo (\\w+(\\w* *)*) is a (\\w+(\\w* *)*)">>, [global, {capture, [1,3],
binary}]]).
{1052818,
{match,[[<<"bar is a foo bar">>,<<"big yellow boat or">>]]}}
My guess, therefore, is that the regexp times out/reaches some run-duration
limit on the longer input string.
Fixing the regex to not have repetitions of something that matches "" helps
both the run time:
9> timer:tc(re, run, [<<"foo bar is a foo bar is a big yellow boat or">>,
<<"^foo (\\w+ *(\\w+ *)*) is a (\\w+ *(\\w+ *)*)">>, [global, {capture,
[1,3], binary}]]).
{14315,
{match,[[<<"bar is a foo bar">>,<<"big yellow boat or">>]]}}
% 74x faster :-)
and the result for the longer input string:
10> timer:tc(re, run, [<<"foo bar is a foo bar is a big yellow boat or
sub">>, <<"^foo (\\w+ *(\\w+ *)*) is a (\\w+ *(\\w+ *)*)">>, [global,
{capture, [1,3], binary}]]).
{49911,
{match,[[<<"bar is a foo bar">>,
<<"big yellow boat or sub">>]]}}
/Erik
2013/10/24 Vyacheslav Levytskyy <v.levytskyy@REDACTED>
> Hello,
>
> According to 're' module documentation, "the quantifiers are "greedy",
> that is, they match as much as possible (up to the maximum number of
> permitted times)". This seems to be a problem with your case. The regex you
> are using seems a bit problematic, forcing 're' to exhausting repetitions.
>
> As an option, you can use 'ungreedy' option, making only some of
> quantifiers greedy via following them by "?". See for example:
> re:run(<<"foo bar is a foo bar is a big yellow boat or sub">>, <<"^foo
> (\\w(\\w+| )*) is a (\\w(\\w+?| )*?)">>, [ungreedy, global, {capture,
> [1,3], binary}]).
> {match,[[<<"bar">>,
> <<"foo bar is a big yellow boat or sub">>]]}
>
> Best regards,
> Vyacheslav Levytskyy
>
>
> On 23.10.2013 22:26, Alexander Petrovsky wrote:
>
> Hi!
>
> I have the regex "^foo (\\w+(\\w* *)*) is an (\\w+(\\w* *)*)", and I get
> strange behaviour when I do:
>
> 1> re:run(<<"foo bar is a foo bar is a big yellow boat or">>, <<"^foo
> (\\w+(\\w* *)*) is a (\\w+(\\w* *)*)">>, [global, {capture, [1,3],
> binary}]).
> {match,[[<<"bar is a foo bar">>,<<"big yellow boat or">>]]}
>
> 2> re:run(<<"foo bar is a foo bar is a big yellow boat or sub">>,
> <<"^foo (\\w+(\\w* *)*) is a (\\w+(\\w* *)*)">>, [global, {capture, [1,3],
> binary}]).
> nomatch
>
> I tested this regexp in clojure and python:
>
> => (re-matches #"foo (\w+(\w* *)*) is a (\w+(\w* *)*)" "foo bar is a foo
> bar is a big yellow boat or")
> ["foo bar is a foo bar is a big yellow boat or" "bar is a foo bar" "" "big
> yellow boat or" ""]
>
> => (re-matches #"foo (\w+(\w* *)*) is a (\w+(\w* *)*)" "foo bar is a foo
> bar is a big yellow boat or sub")
> ["foo bar is a foo bar is a big yellow boat or sub" "bar is a foo bar" ""
> "big yellow boat or sub" ""]
>
> >>> import re
> >>> p = re.compile('foo (\w+(\w* *)*) is a (\w+(\w* *)*)')
> >>> p.match("foo bar is a foo bar is a big yellow boat or")
> <_sre.SRE_Match object at 0x100293c00>
> >>> p.match("foo bar is a foo bar is a big yellow boat or sub")
> <_sre.SRE_Match object at 0x100293ab0>
>
> Can someone explain me, why I get on second string "foo bar is a foo bar
> is a big yellow boat or sub" nomatch? This is a bug?
>
>
> --
> Петровский Александр / Alexander Petrovsky,
>
> Skype: askjuise
> Jabber: juise@REDACTED
> Phone: +7 914 8 820 815 (irkutsk)
>
>
>
> _______________________________________________
> erlang-questions mailing listerlang-questions@REDACTED://erlang.org/mailman/listinfo/erlang-questions
>
>
>
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20131024/6b4c1e56/attachment.htm>
More information about the erlang-questions
mailing list