<div dir="ltr">Thanks for explain.<div class="gmail_extra"><br><br><div class="gmail_quote">2013/10/24 Vyacheslav Levytskyy <span dir="ltr"><<a href="mailto:v.levytskyy@yahoo.com" target="_blank">v.levytskyy@yahoo.com</a>></span><br>

<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">

  <div bgcolor="#FFFFFF" text="#000000">

    <div>Agree, it is similar to what I have

      written, the initial regex was the problem. It is not so important

      what is the number of the limit, from documentation we know that

      it exists ("maximum number of permitted times"), and anyway it is

      the regex what should be fixed.<span><font color="#888888"><br>

      <br>

      Vyacheslav</font></span><div><div><br>

      <br>

      On 24.10.2013 16:36, Erik SÃ¸e SÃ¸rensen wrote:<br>

    </div></div></div><div><div>

    <blockquote type="cite">

      <div dir="ltr">

        <div>

          <div>

            <div>

              <div>

                <div>

                  <div>Being greedy or not shouldn't change whether the

                    regex matches or not.<br>

                  </div>

                  I believe the issue is something else...:<br>

                </div>

                It's bad practise to have repetitions of something that

                matches the empty string - such as (\w* *)* - because

                that could be repeated any number of times.<br>

              </div>

              Indeed, the original regex runs pretty slowly:<br>

              <br>

              8> timer:tc(re, run, [<<"foo bar is a foo bar is

              a big yellow boat or">>, <<"^foo (\\w+(\\w*

              *)*) is a (\\w+(\\w* *)*)">>, [global, {capture,

              [1,3], binary}]]). <br>

              {1052818,<br>

              Â {match,[[<<"bar is a foo bar">>,<<"big

              yellow boat or">>]]}}<br>

              <br>

            </div>

            My guess, therefore, is that the regexp times out/reaches

            some run-duration limit on the longer input string.<br>

            <br>

          </div>

          Fixing the regex to not have repetitions of something that

          matches "" helps both the run time:<br>

          <br>

          9> timer:tc(re, run, [<<"foo bar is a foo bar is a

          big yellow boat or">>, <<"^foo (\\w+ *(\\w+ *)*)

          is a (\\w+ *(\\w+ *)*)">>, [global, {capture, [1,3],

          binary}]]).<br>

          {14315,<br>

          Â {match,[[<<"bar is a foo bar">>,<<"big

          yellow boat or">>]]}}<br>

        </div>

        <div>% 74x faster :-)<br>

        </div>

        <div><br>

        </div>

        and the result for the longer input string:<br>

        <br>

        <div>10> timer:tc(re, run, [<<"foo bar is a foo bar is

          a big yellow boat or sub">>, <<"^foo (\\w+ *(\\w+

          *)*) is a (\\w+ *(\\w+ *)*)">>, [global, {capture,

          [1,3], binary}]]). <br>

          {49911,<br>

          Â {match,[[<<"bar is a foo bar">>,<br>

          Â Â Â Â Â Â Â Â Â  <<"big yellow boat or sub">>]]}}<br></div></div></blockquote></div></div></div></blockquote><div>It is good results.</div><div><br></div><div>BTW I find more efficient regex:</div>

<div><br></div><div>1> timer:tc(fun() -> re:run(<<"foo bar is a foo bar is a big yellow boat or sub">>, <<"^foo (\\w+(\\w| )*) is a (\\w+(\\w| )*)$">>, [global, {capture, [1,3], binary}]) end).</div>

<div>{27190,</div><div>Â {match,[[<<"bar is a foo bar">>, <<"big yellow boat or sub">>]]}}</div><div><br></div><div>2> timer:tc(fun() -> re:run(<<"foo bar is a foo bar is a big yellow boat or sub">>, <<"^foo (\\w+(\\w| )*) is a (\\w+(\\w| )*)$">>, [global, {capture, [1,3], binary}]) end).</div>

<div>{143,</div><div>Â {match,[[<<"bar is a foo bar">>, <<"big yellow boat or sub">>]]}}</div><div><br></div><div><div>3> timer:tc(fun() -> re:run(<<"foo bar is a foo bar is a big yellow boat or">>, <<"^foo (\\w+(\\w| )*) is a (\\w+(\\w| )*)$">>, [global, {capture, [1,3], binary}]) end).</div>

<div>{138,</div><div>Â {match,[[<<"bar is a foo bar">>,<<"big yellow boat or">>]]}</div></div><div>Â </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">

<div bgcolor="#FFFFFF" text="#000000"><div><div><blockquote type="cite"><div dir="ltr"><div>

          <br>

        </div>

        <div>/Erik<br>

        </div>

      </div>

      <div class="gmail_extra"><br>

        <br>

        <div class="gmail_quote">

          2013/10/24 Vyacheslav Levytskyy <span dir="ltr"><<a href="mailto:v.levytskyy@yahoo.com" target="_blank">v.levytskyy@yahoo.com</a>></span><br>

          <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">

            <div bgcolor="#FFFFFF" text="#000000">

              <div>Hello,<br>

                <br>

                According to 're' module documentation, "the quantifiers

                are "greedy", that is, they match as much as possible

                (up to the maximum number of permitted times)". This

                seems to be a problem with your case. The regex you are

                using seems a bit problematic, forcing 're' to

                exhausting repetitions.<br>

                <br>

                As an option, you can use 'ungreedy' option, making only

                some of quantifiers greedy via following them by "?".

                See for example:<br>

                re:run(<<"foo bar is a foo bar is a big yellow

                boat or sub">>, <<"^foo (\\w(\\w+| )*) is a

                (\\w(\\w+?| )*?)">>, [ungreedy, global, {capture,

                [1,3], binary}]).<br>

                {match,[[<<"bar">>,<br>

                Â Â Â Â Â Â Â Â  <<"foo bar is a big yellow boat or

                sub">>]]}<br>

                <br>

                Best regards,<br>

                Vyacheslav Levytskyy

                <div>

                  <div><br>

                    <br>

                    On 23.10.2013 22:26, Alexander Petrovsky wrote:<br>

                  </div>

                </div>

              </div>

              <blockquote type="cite">

                <div>

                  <div>

                    <div dir="ltr">Hi!

                      <div><br>

                      </div>

                      <div>I have the regex "^foo (\\w+(\\w* *)*) is an

                        (\\w+(\\w* *)*)", and I get strange behaviour

                        when I do:</div>

                      <div><br>

                      </div>

                      <div>1> re:run(<<"foo bar is a foo bar is

                        a big yellow boat or">>, <<"^foo

                        (\\w+(\\w* *)*) is a (\\w+(\\w* *)*)">>,

                        [global, {capture, [1,3], binary}]).</div>

                      <div>{match,[[<<"bar is a foo

                        bar">>,<<"big yellow boat

                        or">>]]}</div>

                      <div><br>

                      </div>

                      <div>2> re:run(<<"foo bar is a foo bar is

                        a big yellow boat or sub">>, <<"^foo

                        (\\w+(\\w* *)*) is a (\\w+(\\w* *)*)">>,

                        [global, {capture, [1,3], binary}]).</div>

                      <div>nomatchÂ </div>

                      <div><br>

                      </div>

                      <div>I tested this regexp in clojure and python:</div>

                      <div><br>

                      </div>

                      <div>

                        <div>=> (re-matches #"foo (\w+(\w* *)*) is a

                          (\w+(\w* *)*)" "foo bar is a foo bar is a big

                          yellow boat or")</div>

                        <div>["foo bar is a foo bar is a big yellow boat

                          or" "bar is a foo bar" "" "big yellow boat or"

                          ""]</div>

                        <div><br>

                        </div>

                        <div>=> (re-matches #"foo (\w+(\w* *)*) is a

                          (\w+(\w* *)*)" "foo bar is a foo bar is a big

                          yellow boat or sub")</div>

                        <div>["foo bar is a foo bar is a big yellow boat

                          or sub" "bar is a foo bar" "" "big yellow boat

                          or sub" ""]</div>

                      </div>

                      <div><br>

                      </div>

                      <div>

                        <div>>>> import re</div>

                        <div>>>> p = re.compile('foo (\w+(\w*

                          *)*) is a (\w+(\w* *)*)')</div>

                        <div>>>> p.match("foo bar is a foo bar

                          is a big yellow boat or")</div>

                        <div><_sre.SRE_Match object at

                          0x100293c00></div>

                        <div>>>> p.match("foo bar is a foo bar

                          is a big yellow boat or sub")</div>

                        <div><_sre.SRE_Match object at

                          0x100293ab0></div>

                      </div>

                      <div><br>

                      </div>

                      <div>Can someone explain me, why I get on second

                        string "foo bar is a foo bar is a big yellow

                        boat or sub" nomatch? This is a bug?</div>

                      <div><br clear="all">

                        <div><br>

                        </div>

                        -- <br>

                        <div dir="ltr">ÐŸÐµÑ‚Ñ€Ð¾Ð²ÑÐºÐ¸Ð¹ ÐÐ»ÐµÐºÑÐ°Ð½Ð´Ñ€ / Alexander

                          Petrovsky,<br>

                          <br>

                          Skype: askjuise<br>

                          Jabber: <a href="mailto:juise@jabber.ru" target="_blank">juise@jabber.ru</a><br>

                          <div>Phone: <a href="tel:%2B7%20914%208%20820%20815" value="+79148820815" target="_blank">+7

                              914 8 820 815</a> (irkutsk)

                            <div> <br>

                            </div>

                          </div>

                        </div>

                      </div>

                    </div>

                    <br>

                    <fieldset></fieldset>

                    <br>

                  </div>

                </div>

                <pre>_______________________________________________

erlang-questions mailing list

<a href="mailto:erlang-questions@erlang.org" target="_blank">erlang-questions@erlang.org</a>

<a href="http://erlang.org/mailman/listinfo/erlang-questions" target="_blank">http://erlang.org/mailman/listinfo/erlang-questions</a>

</pre>

              </blockquote>

              <br>

            </div>

            <br>

            _______________________________________________<br>

            erlang-questions mailing list<br>

            <a href="mailto:erlang-questions@erlang.org" target="_blank">erlang-questions@erlang.org</a><br>

            <a href="http://erlang.org/mailman/listinfo/erlang-questions" target="_blank">http://erlang.org/mailman/listinfo/erlang-questions</a><br>

            <br>

          </blockquote>

        </div>

        <br>

      </div>

    </blockquote>

    <br>

  </div></div></div>

</blockquote></div><br><br clear="all"><div><br></div>-- <br><div dir="ltr">ÐŸÐµÑ‚Ñ€Ð¾Ð²ÑÐºÐ¸Ð¹ ÐÐ»ÐµÐºÑÐ°Ð½Ð´Ñ€ / Alexander Petrovsky,<br><br>Skype: askjuise<br>Jabber: <a href="mailto:juise@jabber.ru" target="_blank">juise@jabber.ru</a><br>

<div>Phone: +7 914 8 820 815 (irkutsk)<div><br></div></div></div>

</div></div>