[erlang-questions] run strange behaviour

Vyacheslav Levytskyy <>
Thu Oct 24 16:03:17 CEST 2013


Agree, it is similar to what I have written, the initial regex was the 
problem. It is not so important what is the number of the limit, from 
documentation we know that it exists ("maximum number of permitted 
times"), and anyway it is the regex what should be fixed.

Vyacheslav

On 24.10.2013 16:36, Erik Søe Sørensen wrote:
> Being greedy or not shouldn't change whether the regex matches or not.
> I believe the issue is something else...:
> It's bad practise to have repetitions of something that matches the 
> empty string - such as (\w* *)* - because that could be repeated any 
> number of times.
> Indeed, the original regex runs pretty slowly:
>
> 8> timer:tc(re, run, [<<"foo bar is a foo bar is a big yellow boat 
> or">>, <<"^foo (\\w+(\\w* *)*) is a (\\w+(\\w* *)*)">>, [global, 
> {capture, [1,3], binary}]]).
> {1052818,
>  {match,[[<<"bar is a foo bar">>,<<"big yellow boat or">>]]}}
>
> My guess, therefore, is that the regexp times out/reaches some 
> run-duration limit on the longer input string.
>
> Fixing the regex to not have repetitions of something that matches "" 
> helps both the run time:
>
> 9> timer:tc(re, run, [<<"foo bar is a foo bar is a big yellow boat 
> or">>, <<"^foo (\\w+ *(\\w+ *)*) is a (\\w+ *(\\w+ *)*)">>, [global, 
> {capture, [1,3], binary}]]).
> {14315,
>  {match,[[<<"bar is a foo bar">>,<<"big yellow boat or">>]]}}
> % 74x faster :-)
>
> and the result for the longer input string:
>
> 10> timer:tc(re, run, [<<"foo bar is a foo bar is a big yellow boat or 
> sub">>, <<"^foo (\\w+ *(\\w+ *)*) is a (\\w+ *(\\w+ *)*)">>, [global, 
> {capture, [1,3], binary}]]).
> {49911,
>  {match,[[<<"bar is a foo bar">>,
>           <<"big yellow boat or sub">>]]}}
>
> /Erik
>
>
> 2013/10/24 Vyacheslav Levytskyy < 
> <mailto:>>
>
>     Hello,
>
>     According to 're' module documentation, "the quantifiers are
>     "greedy", that is, they match as much as possible (up to the
>     maximum number of permitted times)". This seems to be a problem
>     with your case. The regex you are using seems a bit problematic,
>     forcing 're' to exhausting repetitions.
>
>     As an option, you can use 'ungreedy' option, making only some of
>     quantifiers greedy via following them by "?". See for example:
>     re:run(<<"foo bar is a foo bar is a big yellow boat or sub">>,
>     <<"^foo (\\w(\\w+| )*) is a (\\w(\\w+?| )*?)">>, [ungreedy,
>     global, {capture, [1,3], binary}]).
>     {match,[[<<"bar">>,
>              <<"foo bar is a big yellow boat or sub">>]]}
>
>     Best regards,
>     Vyacheslav Levytskyy
>
>
>     On 23.10.2013 22:26, Alexander Petrovsky wrote:
>>     Hi!
>>
>>     I have the regex "^foo (\\w+(\\w* *)*) is an (\\w+(\\w* *)*)",
>>     and I get strange behaviour when I do:
>>
>>     1> re:run(<<"foo bar is a foo bar is a big yellow boat or">>,
>>     <<"^foo (\\w+(\\w* *)*) is a (\\w+(\\w* *)*)">>, [global,
>>     {capture, [1,3], binary}]).
>>     {match,[[<<"bar is a foo bar">>,<<"big yellow boat or">>]]}
>>
>>     2> re:run(<<"foo bar is a foo bar is a big yellow boat or sub">>,
>>     <<"^foo (\\w+(\\w* *)*) is a (\\w+(\\w* *)*)">>, [global,
>>     {capture, [1,3], binary}]).
>>     nomatch
>>
>>     I tested this regexp in clojure and python:
>>
>>     => (re-matches #"foo (\w+(\w* *)*) is a (\w+(\w* *)*)" "foo bar
>>     is a foo bar is a big yellow boat or")
>>     ["foo bar is a foo bar is a big yellow boat or" "bar is a foo
>>     bar" "" "big yellow boat or" ""]
>>
>>     => (re-matches #"foo (\w+(\w* *)*) is a (\w+(\w* *)*)" "foo bar
>>     is a foo bar is a big yellow boat or sub")
>>     ["foo bar is a foo bar is a big yellow boat or sub" "bar is a foo
>>     bar" "" "big yellow boat or sub" ""]
>>
>>     >>> import re
>>     >>> p = re.compile('foo (\w+(\w* *)*) is a (\w+(\w* *)*)')
>>     >>> p.match("foo bar is a foo bar is a big yellow boat or")
>>     <_sre.SRE_Match object at 0x100293c00>
>>     >>> p.match("foo bar is a foo bar is a big yellow boat or sub")
>>     <_sre.SRE_Match object at 0x100293ab0>
>>
>>     Can someone explain me, why I get on second string "foo bar is a
>>     foo bar is a big yellow boat or sub" nomatch? This is a bug?
>>
>>
>>     -- 
>>     Петровский Александр / Alexander Petrovsky,
>>
>>     Skype: askjuise
>>     Jabber:  <mailto:>
>>     Phone: +7 914 8 820 815 <tel:%2B7%20914%208%20820%20815> (irkutsk)
>>
>>
>>
>>     _______________________________________________
>>     erlang-questions mailing list
>>       <mailto:>
>>     http://erlang.org/mailman/listinfo/erlang-questions
>
>
>     _______________________________________________
>     erlang-questions mailing list
>      <mailto:>
>     http://erlang.org/mailman/listinfo/erlang-questions
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20131024/b8e0e782/attachment.html>


More information about the erlang-questions mailing list