<div dir="ltr">Thanks for explain.<div class="gmail_extra"><br><br><div class="gmail_quote">2013/10/24 Vyacheslav Levytskyy <span dir="ltr"><<a href="mailto:v.levytskyy@yahoo.com" target="_blank">v.levytskyy@yahoo.com</a>></span><br>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000">
<div>Agree, it is similar to what I have
written, the initial regex was the problem. It is not so important
what is the number of the limit, from documentation we know that
it exists ("maximum number of permitted times"), and anyway it is
the regex what should be fixed.<span><font color="#888888"><br>
<br>
Vyacheslav</font></span><div><div><br>
<br>
On 24.10.2013 16:36, Erik Søe Sørensen wrote:<br>
</div></div></div><div><div>
<blockquote type="cite">
<div dir="ltr">
<div>
<div>
<div>
<div>
<div>
<div>Being greedy or not shouldn't change whether the
regex matches or not.<br>
</div>
I believe the issue is something else...:<br>
</div>
It's bad practise to have repetitions of something that
matches the empty string - such as (\w* *)* - because
that could be repeated any number of times.<br>
</div>
Indeed, the original regex runs pretty slowly:<br>
<br>
8> timer:tc(re, run, [<<"foo bar is a foo bar is
a big yellow boat or">>, <<"^foo (\\w+(\\w*
*)*) is a (\\w+(\\w* *)*)">>, [global, {capture,
[1,3], binary}]]). <br>
{1052818,<br>
{match,[[<<"bar is a foo bar">>,<<"big
yellow boat or">>]]}}<br>
<br>
</div>
My guess, therefore, is that the regexp times out/reaches
some run-duration limit on the longer input string.<br>
<br>
</div>
Fixing the regex to not have repetitions of something that
matches "" helps both the run time:<br>
<br>
9> timer:tc(re, run, [<<"foo bar is a foo bar is a
big yellow boat or">>, <<"^foo (\\w+ *(\\w+ *)*)
is a (\\w+ *(\\w+ *)*)">>, [global, {capture, [1,3],
binary}]]).<br>
{14315,<br>
{match,[[<<"bar is a foo bar">>,<<"big
yellow boat or">>]]}}<br>
</div>
<div>% 74x faster :-)<br>
</div>
<div><br>
</div>
and the result for the longer input string:<br>
<br>
<div>10> timer:tc(re, run, [<<"foo bar is a foo bar is
a big yellow boat or sub">>, <<"^foo (\\w+ *(\\w+
*)*) is a (\\w+ *(\\w+ *)*)">>, [global, {capture,
[1,3], binary}]]). <br>
{49911,<br>
{match,[[<<"bar is a foo bar">>,<br>
<<"big yellow boat or sub">>]]}}<br></div></div></blockquote></div></div></div></blockquote><div>It is good results.</div><div><br></div><div>BTW I find more efficient regex:</div>
<div><br></div><div>1> timer:tc(fun() -> re:run(<<"foo bar is a foo bar is a big yellow boat or sub">>, <<"^foo (\\w+(\\w| )*) is a (\\w+(\\w| )*)$">>, [global, {capture, [1,3], binary}]) end).</div>
<div>{27190,</div><div> {match,[[<<"bar is a foo bar">>, <<"big yellow boat or sub">>]]}}</div><div><br></div><div>2> timer:tc(fun() -> re:run(<<"foo bar is a foo bar is a big yellow boat or sub">>, <<"^foo (\\w+(\\w| )*) is a (\\w+(\\w| )*)$">>, [global, {capture, [1,3], binary}]) end).</div>
<div>{143,</div><div> {match,[[<<"bar is a foo bar">>, <<"big yellow boat or sub">>]]}}</div><div><br></div><div><div>3> timer:tc(fun() -> re:run(<<"foo bar is a foo bar is a big yellow boat or">>, <<"^foo (\\w+(\\w| )*) is a (\\w+(\\w| )*)$">>, [global, {capture, [1,3], binary}]) end).</div>
<div>{138,</div><div> {match,[[<<"bar is a foo bar">>,<<"big yellow boat or">>]]}</div></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000"><div><div><blockquote type="cite"><div dir="ltr"><div>
<br>
</div>
<div>/Erik<br>
</div>
</div>
<div class="gmail_extra"><br>
<br>
<div class="gmail_quote">
2013/10/24 Vyacheslav Levytskyy <span dir="ltr"><<a href="mailto:v.levytskyy@yahoo.com" target="_blank">v.levytskyy@yahoo.com</a>></span><br>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000">
<div>Hello,<br>
<br>
According to 're' module documentation, "the quantifiers
are "greedy", that is, they match as much as possible
(up to the maximum number of permitted times)". This
seems to be a problem with your case. The regex you are
using seems a bit problematic, forcing 're' to
exhausting repetitions.<br>
<br>
As an option, you can use 'ungreedy' option, making only
some of quantifiers greedy via following them by "?".
See for example:<br>
re:run(<<"foo bar is a foo bar is a big yellow
boat or sub">>, <<"^foo (\\w(\\w+| )*) is a
(\\w(\\w+?| )*?)">>, [ungreedy, global, {capture,
[1,3], binary}]).<br>
{match,[[<<"bar">>,<br>
<<"foo bar is a big yellow boat or
sub">>]]}<br>
<br>
Best regards,<br>
Vyacheslav Levytskyy
<div>
<div><br>
<br>
On 23.10.2013 22:26, Alexander Petrovsky wrote:<br>
</div>
</div>
</div>
<blockquote type="cite">
<div>
<div>
<div dir="ltr">Hi!
<div><br>
</div>
<div>I have the regex "^foo (\\w+(\\w* *)*) is an
(\\w+(\\w* *)*)", and I get strange behaviour
when I do:</div>
<div><br>
</div>
<div>1> re:run(<<"foo bar is a foo bar is
a big yellow boat or">>, <<"^foo
(\\w+(\\w* *)*) is a (\\w+(\\w* *)*)">>,
[global, {capture, [1,3], binary}]).</div>
<div>{match,[[<<"bar is a foo
bar">>,<<"big yellow boat
or">>]]}</div>
<div><br>
</div>
<div>2> re:run(<<"foo bar is a foo bar is
a big yellow boat or sub">>, <<"^foo
(\\w+(\\w* *)*) is a (\\w+(\\w* *)*)">>,
[global, {capture, [1,3], binary}]).</div>
<div>nomatch </div>
<div><br>
</div>
<div>I tested this regexp in clojure and python:</div>
<div><br>
</div>
<div>
<div>=> (re-matches #"foo (\w+(\w* *)*) is a
(\w+(\w* *)*)" "foo bar is a foo bar is a big
yellow boat or")</div>
<div>["foo bar is a foo bar is a big yellow boat
or" "bar is a foo bar" "" "big yellow boat or"
""]</div>
<div><br>
</div>
<div>=> (re-matches #"foo (\w+(\w* *)*) is a
(\w+(\w* *)*)" "foo bar is a foo bar is a big
yellow boat or sub")</div>
<div>["foo bar is a foo bar is a big yellow boat
or sub" "bar is a foo bar" "" "big yellow boat
or sub" ""]</div>
</div>
<div><br>
</div>
<div>
<div>>>> import re</div>
<div>>>> p = re.compile('foo (\w+(\w*
*)*) is a (\w+(\w* *)*)')</div>
<div>>>> p.match("foo bar is a foo bar
is a big yellow boat or")</div>
<div><_sre.SRE_Match object at
0x100293c00></div>
<div>>>> p.match("foo bar is a foo bar
is a big yellow boat or sub")</div>
<div><_sre.SRE_Match object at
0x100293ab0></div>
</div>
<div><br>
</div>
<div>Can someone explain me, why I get on second
string "foo bar is a foo bar is a big yellow
boat or sub" nomatch? This is a bug?</div>
<div><br clear="all">
<div><br>
</div>
-- <br>
<div dir="ltr">Петровский Александр / Alexander
Petrovsky,<br>
<br>
Skype: askjuise<br>
Jabber: <a href="mailto:juise@jabber.ru" target="_blank">juise@jabber.ru</a><br>
<div>Phone: <a href="tel:%2B7%20914%208%20820%20815" value="+79148820815" target="_blank">+7
914 8 820 815</a> (irkutsk)
<div> <br>
</div>
</div>
</div>
</div>
</div>
<br>
<fieldset></fieldset>
<br>
</div>
</div>
<pre>_______________________________________________
erlang-questions mailing list
<a href="mailto:erlang-questions@erlang.org" target="_blank">erlang-questions@erlang.org</a>
<a href="http://erlang.org/mailman/listinfo/erlang-questions" target="_blank">http://erlang.org/mailman/listinfo/erlang-questions</a>
</pre>
</blockquote>
<br>
</div>
<br>
_______________________________________________<br>
erlang-questions mailing list<br>
<a href="mailto:erlang-questions@erlang.org" target="_blank">erlang-questions@erlang.org</a><br>
<a href="http://erlang.org/mailman/listinfo/erlang-questions" target="_blank">http://erlang.org/mailman/listinfo/erlang-questions</a><br>
<br>
</blockquote>
</div>
<br>
</div>
</blockquote>
<br>
</div></div></div>
</blockquote></div><br><br clear="all"><div><br></div>-- <br><div dir="ltr">Петровский Александр / Alexander Petrovsky,<br><br>Skype: askjuise<br>Jabber: <a href="mailto:juise@jabber.ru" target="_blank">juise@jabber.ru</a><br>
<div>Phone: +7 914 8 820 815 (irkutsk)<div><br></div></div></div>
</div></div>