[erlang-questions] Leex scanners and default token matching

Tim Watson watson.timothy@REDACTED
Sun Jul 1 14:20:40 CEST 2012


Hi Robert,

Thanks for the explanation, that makes sense now. I wasn't sure about the escape sequence, but clearly I had it wrong, as the match wants to be something more like ([\+\*-/]{1}), although clearly my ILLEGAL regex isn't account for the '+' which is wrong. 

On 1 Jul 2012, at 03:12, Robert Virding wrote:

> The reason for the infinite loop is the macro definition:
> 
> AOP     = (\\+|-|\\*|/)
> 
> The double \\ means you are quoting the '\' not the '+' and '-'. So that regex means:
> 
> match one-or-more '+'

Because what I'm quoting is the '\', presumably you mean 'match one or more (+) of '\' yes? 

> or
> match '-'
> or
> match zero-or-more '*'    <<<===

Yep, I see why that's screwed up - thanks! 

> or
> match '/'
> 
> This will match zero of any non-matching character so you get a match but no character will be consumed and the scanner will loop over the same character again. For matching characters this is not a problem as you always get the longest match which is always longer than the empty match. When you add your illegal regex this is what happens.
> 
> Having regex which contain just '*' qualified regex is very dangerous as they can match the empty string and so create a loop. I know of no good way to handle this as it is not a bug, it is doing what you told it to do. The only way would be to disallow empty matches.
> 
> Your string regex '([^''])*' looks a little strange.
> 

Well it's meant to say something more like '([^']+)' [immediately following a single quote, match one or more of any character apart from a single quote, followed immediately after by a single quote] but I it was late at night! :)

> Robert




More information about the erlang-questions mailing list