<div dir="ltr">Thanks for explain.<div class="gmail_extra"><br><br><div class="gmail_quote">2013/10/24 Vyacheslav Levytskyy <span dir="ltr"><<a href="mailto:v.levytskyy@yahoo.com" target="_blank">v.levytskyy@yahoo.com</a>></span><br>

<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
  
    
  
  <div bgcolor="#FFFFFF" text="#000000">
    <div>Agree, it is similar to what I have
      written, the initial regex was the problem. It is not so important
      what is the number of the limit, from documentation we know that
      it exists ("maximum number of permitted times"), and anyway it is
      the regex what should be fixed.<span><font color="#888888"><br>
      <br>
      Vyacheslav</font></span><div><div><br>
      <br>
      On 24.10.2013 16:36, Erik Søe Sørensen wrote:<br>
    </div></div></div><div><div>
    <blockquote type="cite">
      <div dir="ltr">
        <div>
          <div>
            <div>
              <div>
                <div>
                  <div>Being greedy or not shouldn't change whether the
                    regex matches or not.<br>
                  </div>
                  I believe the issue is something else...:<br>
                </div>
                It's bad practise to have repetitions of something that
                matches the empty string - such as (\w* *)* - because
                that could be repeated any number of times.<br>
              </div>
              Indeed, the original regex runs pretty slowly:<br>
              <br>
              8> timer:tc(re, run, [<<"foo bar is a foo bar is
              a big yellow boat or">>, <<"^foo (\\w+(\\w*
              *)*) is a (\\w+(\\w* *)*)">>, [global, {capture,
              [1,3], binary}]]). <br>
              {1052818,<br>
               {match,[[<<"bar is a foo bar">>,<<"big
              yellow boat or">>]]}}<br>
              <br>
            </div>
            My guess, therefore, is that the regexp times out/reaches
            some run-duration limit on the longer input string.<br>
            <br>
          </div>
          Fixing the regex to not have repetitions of something that
          matches "" helps both the run time:<br>
          <br>
          9> timer:tc(re, run, [<<"foo bar is a foo bar is a
          big yellow boat or">>, <<"^foo (\\w+ *(\\w+ *)*)
          is a (\\w+ *(\\w+ *)*)">>, [global, {capture, [1,3],
          binary}]]).<br>
          {14315,<br>
           {match,[[<<"bar is a foo bar">>,<<"big
          yellow boat or">>]]}}<br>
        </div>
        <div>% 74x faster :-)<br>
        </div>
        <div><br>
        </div>
        and the result for the longer input string:<br>
        <br>
        <div>10> timer:tc(re, run, [<<"foo bar is a foo bar is
          a big yellow boat or sub">>, <<"^foo (\\w+ *(\\w+
          *)*) is a (\\w+ *(\\w+ *)*)">>, [global, {capture,
          [1,3], binary}]]). <br>
          {49911,<br>
           {match,[[<<"bar is a foo bar">>,<br>
                    <<"big yellow boat or sub">>]]}}<br></div></div></blockquote></div></div></div></blockquote><div>It is good results.</div><div><br></div><div>BTW I find more efficient regex:</div>
<div><br></div><div>1> timer:tc(fun() -> re:run(<<"foo bar is a foo bar is a big yellow boat or sub">>, <<"^foo (\\w+(\\w| )*) is a (\\w+(\\w| )*)$">>, [global, {capture, [1,3], binary}]) end).</div>
<div>{27190,</div><div> {match,[[<<"bar is a foo bar">>, <<"big yellow boat or sub">>]]}}</div><div><br></div><div>2> timer:tc(fun() -> re:run(<<"foo bar is a foo bar is a big yellow boat or sub">>, <<"^foo (\\w+(\\w| )*) is a (\\w+(\\w| )*)$">>, [global, {capture, [1,3], binary}]) end).</div>
<div>{143,</div><div> {match,[[<<"bar is a foo bar">>, <<"big yellow boat or sub">>]]}}</div><div><br></div><div><div>3> timer:tc(fun() -> re:run(<<"foo bar is a foo bar is a big yellow boat or">>, <<"^foo (\\w+(\\w| )*) is a (\\w+(\\w| )*)$">>, [global, {capture, [1,3], binary}]) end).</div>
<div>{138,</div><div> {match,[[<<"bar is a foo bar">>,<<"big yellow boat or">>]]}</div></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">

<div bgcolor="#FFFFFF" text="#000000"><div><div><blockquote type="cite"><div dir="ltr"><div>
          <br>
        </div>
        <div>/Erik<br>
        </div>
      </div>
      <div class="gmail_extra"><br>
        <br>
        <div class="gmail_quote">
          2013/10/24 Vyacheslav Levytskyy <span dir="ltr"><<a href="mailto:v.levytskyy@yahoo.com" target="_blank">v.levytskyy@yahoo.com</a>></span><br>
          <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
            <div bgcolor="#FFFFFF" text="#000000">
              <div>Hello,<br>
                <br>
                According to 're' module documentation, "the quantifiers
                are "greedy", that is, they match as much as possible
                (up to the maximum number of permitted times)". This
                seems to be a problem with your case. The regex you are
                using seems a bit problematic, forcing 're' to
                exhausting repetitions.<br>
                <br>
                As an option, you can use 'ungreedy' option, making only
                some of quantifiers greedy via following them by "?".
                See for example:<br>
                re:run(<<"foo bar is a foo bar is a big yellow
                boat or sub">>, <<"^foo (\\w(\\w+| )*) is a
                (\\w(\\w+?| )*?)">>, [ungreedy, global, {capture,
                [1,3], binary}]).<br>
                {match,[[<<"bar">>,<br>
                         <<"foo bar is a big yellow boat or
                sub">>]]}<br>
                <br>
                Best regards,<br>
                Vyacheslav Levytskyy
                <div>
                  <div><br>
                    <br>
                    On 23.10.2013 22:26, Alexander Petrovsky wrote:<br>
                  </div>
                </div>
              </div>
              <blockquote type="cite">
                <div>
                  <div>
                    <div dir="ltr">Hi!
                      <div><br>
                      </div>
                      <div>I have the regex "^foo (\\w+(\\w* *)*) is an
                        (\\w+(\\w* *)*)", and I get strange behaviour
                        when I do:</div>
                      <div><br>
                      </div>
                      <div>1> re:run(<<"foo bar is a foo bar is
                        a big yellow boat or">>, <<"^foo
                        (\\w+(\\w* *)*) is a (\\w+(\\w* *)*)">>,
                        [global, {capture, [1,3], binary}]).</div>
                      <div>{match,[[<<"bar is a foo
                        bar">>,<<"big yellow boat
                        or">>]]}</div>
                      <div><br>
                      </div>
                      <div>2> re:run(<<"foo bar is a foo bar is
                        a big yellow boat or sub">>, <<"^foo
                        (\\w+(\\w* *)*) is a (\\w+(\\w* *)*)">>,
                        [global, {capture, [1,3], binary}]).</div>
                      <div>nomatch </div>
                      <div><br>
                      </div>
                      <div>I tested this regexp in clojure and python:</div>
                      <div><br>
                      </div>
                      <div>
                        <div>=> (re-matches #"foo (\w+(\w* *)*) is a
                          (\w+(\w* *)*)" "foo bar is a foo bar is a big
                          yellow boat or")</div>
                        <div>["foo bar is a foo bar is a big yellow boat
                          or" "bar is a foo bar" "" "big yellow boat or"
                          ""]</div>
                        <div><br>
                        </div>
                        <div>=> (re-matches #"foo (\w+(\w* *)*) is a
                          (\w+(\w* *)*)" "foo bar is a foo bar is a big
                          yellow boat or sub")</div>
                        <div>["foo bar is a foo bar is a big yellow boat
                          or sub" "bar is a foo bar" "" "big yellow boat
                          or sub" ""]</div>
                      </div>
                      <div><br>
                      </div>
                      <div>
                        <div>>>> import re</div>
                        <div>>>> p = re.compile('foo (\w+(\w*
                          *)*) is a (\w+(\w* *)*)')</div>
                        <div>>>> p.match("foo bar is a foo bar
                          is a big yellow boat or")</div>
                        <div><_sre.SRE_Match object at
                          0x100293c00></div>
                        <div>>>> p.match("foo bar is a foo bar
                          is a big yellow boat or sub")</div>
                        <div><_sre.SRE_Match object at
                          0x100293ab0></div>
                      </div>
                      <div><br>
                      </div>
                      <div>Can someone explain me, why I get on second
                        string "foo bar is a foo bar is a big yellow
                        boat or sub" nomatch? This is a bug?</div>
                      <div><br clear="all">
                        <div><br>
                        </div>
                        -- <br>
                        <div dir="ltr">Петровский Александр / Alexander
                          Petrovsky,<br>
                          <br>
                          Skype: askjuise<br>
                          Jabber: <a href="mailto:juise@jabber.ru" target="_blank">juise@jabber.ru</a><br>
                          <div>Phone: <a href="tel:%2B7%20914%208%20820%20815" value="+79148820815" target="_blank">+7
                              914 8 820 815</a> (irkutsk)
                            <div> <br>
                            </div>
                          </div>
                        </div>
                      </div>
                    </div>
                    <br>
                    <fieldset></fieldset>
                    <br>
                  </div>
                </div>
                <pre>_______________________________________________
erlang-questions mailing list
<a href="mailto:erlang-questions@erlang.org" target="_blank">erlang-questions@erlang.org</a>
<a href="http://erlang.org/mailman/listinfo/erlang-questions" target="_blank">http://erlang.org/mailman/listinfo/erlang-questions</a>
</pre>
              </blockquote>
              <br>
            </div>
            <br>
            _______________________________________________<br>
            erlang-questions mailing list<br>
            <a href="mailto:erlang-questions@erlang.org" target="_blank">erlang-questions@erlang.org</a><br>
            <a href="http://erlang.org/mailman/listinfo/erlang-questions" target="_blank">http://erlang.org/mailman/listinfo/erlang-questions</a><br>
            <br>
          </blockquote>
        </div>
        <br>
      </div>
    </blockquote>
    <br>
  </div></div></div>

</blockquote></div><br><br clear="all"><div><br></div>-- <br><div dir="ltr">Петровский Александр / Alexander Petrovsky,<br><br>Skype: askjuise<br>Jabber: <a href="mailto:juise@jabber.ru" target="_blank">juise@jabber.ru</a><br>

<div>Phone: +7 914 8 820 815 (irkutsk)<div><br></div></div></div>
</div></div>