<div dir="ltr"><div><div><div><div><div><div>Being greedy or not shouldn't change whether the regex matches or not.<br></div>I believe the issue is something else...:<br></div>It's bad practise to have repetitions of something that matches the empty string - such as (\w* *)* - because that could be repeated any number of times.<br>

</div>Indeed, the original regex runs pretty slowly:<br><br>8> timer:tc(re, run, [<<"foo bar is a foo bar is a big yellow boat or">>, <<"^foo (\\w+(\\w* *)*) is a (\\w+(\\w* *)*)">>, [global, {capture, [1,3], binary}]]). <br>

{1052818,<br> {match,[[<<"bar is a foo bar">>,<<"big yellow boat or">>]]}}<br><br></div>My guess, therefore, is that the regexp times out/reaches some run-duration limit on the longer input string.<br>

<br></div>Fixing the regex to not have repetitions of something that matches "" helps both the run time:<br><br>9> timer:tc(re, run, [<<"foo bar is a foo bar is a big yellow boat or">>, <<"^foo (\\w+ *(\\w+ *)*) is a (\\w+ *(\\w+ *)*)">>, [global, {capture, [1,3], binary}]]).<br>

{14315,<br> {match,[[<<"bar is a foo bar">>,<<"big yellow boat or">>]]}}<br></div><div>% 74x faster :-)<br></div><div><br></div>and the result for the longer input string:<br><br>

<div>10> timer:tc(re, run, [<<"foo bar is a foo bar is a big yellow boat or sub">>, <<"^foo (\\w+ *(\\w+ *)*) is a (\\w+ *(\\w+ *)*)">>, [global, {capture, [1,3], binary}]]). <br>

{49911,<br> {match,[[<<"bar is a foo bar">>,<br>          <<"big yellow boat or sub">>]]}}<br><br></div><div>/Erik<br></div></div><div class="gmail_extra"><br><br><div class="gmail_quote">

2013/10/24 Vyacheslav Levytskyy <span dir="ltr"><<a href="mailto:v.levytskyy@yahoo.com" target="_blank">v.levytskyy@yahoo.com</a>></span><br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

  <div bgcolor="#FFFFFF" text="#000000">

    <div>Hello,<br>

      <br>

      According to 're' module documentation, "the quantifiers are

      "greedy", that is, they match as much as possible (up to the

      maximum number of permitted times)". This seems to be a problem

      with your case. The regex you are using seems a bit problematic,

      forcing 're' to exhausting repetitions.<br>

      <br>

      As an option, you can use 'ungreedy' option, making only some of

      quantifiers greedy via following them by "?". See for example:<br>

      re:run(<<"foo bar is a foo bar is a big yellow boat or

      sub">>, <<"^foo (\\w(\\w+| )*) is a (\\w(\\w+?|

      )*?)">>, [ungreedy, global, {capture, [1,3], binary}]).<br>

      {match,[[<<"bar">>,<br>

               <<"foo bar is a big yellow boat or sub">>]]}<br>

      <br>

      Best regards,<br>

      Vyacheslav Levytskyy<div><div class="h5"><br>

      <br>

      On 23.10.2013 22:26, Alexander Petrovsky wrote:<br>

    </div></div></div>

    <blockquote type="cite"><div><div class="h5">

      <div dir="ltr">Hi!

        <div><br>

        </div>

        <div>I have the regex "^foo (\\w+(\\w* *)*) is an (\\w+(\\w*

          *)*)", and I get strange behaviour when I do:</div>

        <div><br>

        </div>

        <div>1> re:run(<<"foo bar is a foo bar is a big yellow

          boat or">>, <<"^foo (\\w+(\\w* *)*) is a

          (\\w+(\\w* *)*)">>, [global, {capture, [1,3], binary}]).</div>

        <div>{match,[[<<"bar is a foo bar">>,<<"big

          yellow boat or">>]]}</div>

        <div><br>

        </div>

        <div>2> re:run(<<"foo bar is a foo bar is a big yellow

          boat or sub">>, <<"^foo (\\w+(\\w* *)*) is a

          (\\w+(\\w* *)*)">>, [global, {capture, [1,3], binary}]).</div>

        <div>nomatch </div>

        <div><br>

        </div>

        <div>I tested this regexp in clojure and python:</div>

        <div><br>

        </div>

        <div>

          <div>=> (re-matches #"foo (\w+(\w* *)*) is a (\w+(\w* *)*)"

            "foo bar is a foo bar is a big yellow boat or")</div>

          <div>["foo bar is a foo bar is a big yellow boat or" "bar is a

            foo bar" "" "big yellow boat or" ""]</div>

          <div><br>

          </div>

          <div>=> (re-matches #"foo (\w+(\w* *)*) is a (\w+(\w* *)*)"

            "foo bar is a foo bar is a big yellow boat or sub")</div>

          <div>["foo bar is a foo bar is a big yellow boat or sub" "bar

            is a foo bar" "" "big yellow boat or sub" ""]</div>

        </div>

        <div><br>

        </div>

        <div>

          <div>>>> import re</div>

          <div>>>> p = re.compile('foo (\w+(\w* *)*) is a

            (\w+(\w* *)*)')</div>

          <div>>>> p.match("foo bar is a foo bar is a big

            yellow boat or")</div>

          <div><_sre.SRE_Match object at 0x100293c00></div>

          <div>>>> p.match("foo bar is a foo bar is a big

            yellow boat or sub")</div>

          <div><_sre.SRE_Match object at 0x100293ab0></div>

        </div>

        <div><br>

        </div>

        <div>Can someone explain me, why I get on second string "foo bar

          is a foo bar is a big yellow boat or sub" nomatch? This is a

          bug?</div>

        <div><br clear="all">

          <div><br>

          </div>

          -- <br>

          <div dir="ltr">Петровский Александр / Alexander Petrovsky,<br>

            <br>

            Skype: askjuise<br>

            Jabber: <a href="mailto:juise@jabber.ru" target="_blank">juise@jabber.ru</a><br>

            <div>Phone: <a href="tel:%2B7%20914%208%20820%20815" value="+79148820815" target="_blank">+7 914 8 820 815</a> (irkutsk)

              <div>

                <br>

              </div>

            </div>

          </div>

        </div>

      </div>

      <br>

      <fieldset></fieldset>

      <br>

      </div></div><pre>_______________________________________________

erlang-questions mailing list

<a href="mailto:erlang-questions@erlang.org" target="_blank">erlang-questions@erlang.org</a>

<a href="http://erlang.org/mailman/listinfo/erlang-questions" target="_blank">http://erlang.org/mailman/listinfo/erlang-questions</a>

</pre>

    </blockquote>

    <br>

  </div>

<br>_______________________________________________<br>

erlang-questions mailing list<br>

<a href="mailto:erlang-questions@erlang.org">erlang-questions@erlang.org</a><br>

<a href="http://erlang.org/mailman/listinfo/erlang-questions" target="_blank">http://erlang.org/mailman/listinfo/erlang-questions</a><br>

<br></blockquote></div><br></div>