[erlang-questions] Leex scanners and default token matching
Tim Watson
watson.timothy@REDACTED
Sun Jul 1 14:20:40 CEST 2012
Hi Robert,
Thanks for the explanation, that makes sense now. I wasn't sure about the escape sequence, but clearly I had it wrong, as the match wants to be something more like ([\+\*-/]{1}), although clearly my ILLEGAL regex isn't account for the '+' which is wrong.
On 1 Jul 2012, at 03:12, Robert Virding wrote:
> The reason for the infinite loop is the macro definition:
>
> AOP = (\\+|-|\\*|/)
>
> The double \\ means you are quoting the '\' not the '+' and '-'. So that regex means:
>
> match one-or-more '+'
Because what I'm quoting is the '\', presumably you mean 'match one or more (+) of '\' yes?
> or
> match '-'
> or
> match zero-or-more '*' <<<===
Yep, I see why that's screwed up - thanks!
> or
> match '/'
>
> This will match zero of any non-matching character so you get a match but no character will be consumed and the scanner will loop over the same character again. For matching characters this is not a problem as you always get the longest match which is always longer than the empty match. When you add your illegal regex this is what happens.
>
> Having regex which contain just '*' qualified regex is very dangerous as they can match the empty string and so create a loop. I know of no good way to handle this as it is not a bug, it is doing what you told it to do. The only way would be to disallow empty matches.
>
> Your string regex '([^''])*' looks a little strange.
>
Well it's meant to say something more like '([^']+)' [immediately following a single quote, match one or more of any character apart from a single quote, followed immediately after by a single quote] but I it was late at night! :)
> Robert
More information about the erlang-questions
mailing list