<html>
<head>
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<div class="moz-cite-prefix">Hello,<br>
<br>
According to 're' module documentation, "the quantifiers are
"greedy", that is, they match as much as possible (up to the
maximum number of permitted times)". This seems to be a problem
with your case. The regex you are using seems a bit problematic,
forcing 're' to exhausting repetitions.<br>
<br>
As an option, you can use 'ungreedy' option, making only some of
quantifiers greedy via following them by "?". See for example:<br>
re:run(<<"foo bar is a foo bar is a big yellow boat or
sub">>, <<"^foo (\\w(\\w+| )*) is a (\\w(\\w+?|
)*?)">>, [ungreedy, global, {capture, [1,3], binary}]).<br>
{match,[[<<"bar">>,<br>
<<"foo bar is a big yellow boat or sub">>]]}<br>
<br>
Best regards,<br>
Vyacheslav Levytskyy<br>
<br>
On 23.10.2013 22:26, Alexander Petrovsky wrote:<br>
</div>
<blockquote
cite="mid:CAH57y_QFyWQQs_DAw9e6qbDUSt4BAP78tND3cUR9qGJtH-41hg@mail.gmail.com"
type="cite">
<div dir="ltr">Hi!
<div><br>
</div>
<div>I have the regex "^foo (\\w+(\\w* *)*) is an (\\w+(\\w*
*)*)", and I get strange behaviour when I do:</div>
<div><br>
</div>
<div>1> re:run(<<"foo bar is a foo bar is a big yellow
boat or">>, <<"^foo (\\w+(\\w* *)*) is a
(\\w+(\\w* *)*)">>, [global, {capture, [1,3], binary}]).</div>
<div>{match,[[<<"bar is a foo bar">>,<<"big
yellow boat or">>]]}</div>
<div><br>
</div>
<div>2> re:run(<<"foo bar is a foo bar is a big yellow
boat or sub">>, <<"^foo (\\w+(\\w* *)*) is a
(\\w+(\\w* *)*)">>, [global, {capture, [1,3], binary}]).</div>
<div>nomatch </div>
<div><br>
</div>
<div>I tested this regexp in clojure and python:</div>
<div><br>
</div>
<div>
<div>=> (re-matches #"foo (\w+(\w* *)*) is a (\w+(\w* *)*)"
"foo bar is a foo bar is a big yellow boat or")</div>
<div>["foo bar is a foo bar is a big yellow boat or" "bar is a
foo bar" "" "big yellow boat or" ""]</div>
<div><br>
</div>
<div>=> (re-matches #"foo (\w+(\w* *)*) is a (\w+(\w* *)*)"
"foo bar is a foo bar is a big yellow boat or sub")</div>
<div>["foo bar is a foo bar is a big yellow boat or sub" "bar
is a foo bar" "" "big yellow boat or sub" ""]</div>
</div>
<div><br>
</div>
<div>
<div>>>> import re</div>
<div>>>> p = re.compile('foo (\w+(\w* *)*) is a
(\w+(\w* *)*)')</div>
<div>>>> p.match("foo bar is a foo bar is a big
yellow boat or")</div>
<div><_sre.SRE_Match object at 0x100293c00></div>
<div>>>> p.match("foo bar is a foo bar is a big
yellow boat or sub")</div>
<div><_sre.SRE_Match object at 0x100293ab0></div>
</div>
<div><br>
</div>
<div>Can someone explain me, why I get on second string "foo bar
is a foo bar is a big yellow boat or sub" nomatch? This is a
bug?</div>
<div><br clear="all">
<div><br>
</div>
-- <br>
<div dir="ltr">Петровский Александр / Alexander Petrovsky,<br>
<br>
Skype: askjuise<br>
Jabber: <a moz-do-not-send="true"
href="mailto:juise@jabber.ru" target="_blank">juise@jabber.ru</a><br>
<div>Phone: +7 914 8 820 815 (irkutsk)
<div>
<br>
</div>
</div>
</div>
</div>
</div>
<br>
<fieldset class="mimeAttachmentHeader"></fieldset>
<br>
<pre wrap="">_______________________________________________
erlang-questions mailing list
<a class="moz-txt-link-abbreviated" href="mailto:erlang-questions@erlang.org">erlang-questions@erlang.org</a>
<a class="moz-txt-link-freetext" href="http://erlang.org/mailman/listinfo/erlang-questions">http://erlang.org/mailman/listinfo/erlang-questions</a>
</pre>
</blockquote>
<br>
</body>
</html>