[erlang-questions] Leex scanners and default token matching
Tim Watson
watson.timothy@REDACTED
Sat Jun 30 13:31:04 CEST 2012
Hi all,
I've got a simple Leex scanner, which appears to go into a non-terminating state for certain inputs, consuming 100% CPU and quickly eating up all available memory. I found this *very* surprising - should the generated scanner really be able to get itself into this state? Is there some way for me to provide a default rule that will execute if no other regex matches, so I can return {error, Reason} for this???
Below is a copy of the xrl file - running scanner:string("aa = !") will cause it to hang. Am I missing some obvious way of preventing this?
Definitions.
COMMA = [,]
PARENS = [\(\)]
L = [A-Za-z_\$]
D = [0-9-]
F = (\+|-)?[0-9]+\.[0-9]+((E|e)(\+|-)?[0-9]+)?
HEX = 0x[0-9]+
WS = ([\000-\s]|%.*)
S = ({COMMA}|{PARENS})
CMP = (=|>|>=|<|<=|<>)
AOP = (\\+|-|\\*|/)
Rules.
LIKE : {token, {op_like, TokenLine, like}}.
IN : {token, {op_in, TokenLine, in}}.
AND : {token, {op_and, TokenLine, conjunction}}.
OR : {token, {op_or, TokenLine, disjunction}}.
NOT : {token, {op_not, TokenLine, negation}}.
IS{WS}NULL : {token, {op_null, TokenLine, is_null}}.
IS{WS}NOT{WS}NULL : {token, {op_null, TokenLine, not_null}}.
BETWEEN : {token, {op_between, TokenLine, range}}.
ESCAPE : {token, {escape, TokenLine, escape}}.
{CMP} : {token, {op_cmp, TokenLine, atomize(TokenChars)}}.
{AOP} : {token, {op_arith, TokenLine, atomize(TokenChars)}}.
{L}({L}|{D})* : {token, {ident, TokenLine, TokenChars}}.
'([^''])*' : {token, {lit_string, TokenLine, strip(TokenChars,TokenLen)}}.
{S} : {token, {list_to_atom(TokenChars),TokenLine}}.
{D}+ : {token, {lit_int, TokenLine, list_to_integer(TokenChars)}}.
{F} : {token, {lit_flt, TokenLine, list_to_float(TokenChars)}}.
{HEX} : {token, {lit_hex, TokenLine, hex_to_int(TokenChars)}}.
{WS}+ : skip_token.
Erlang code.
strip(TokenChars,TokenLen) ->
lists:sublist(TokenChars, 2, TokenLen - 2).
hex_to_int([_,_|R]) ->
{ok,[Int],[]} = io_lib:fread("~16u", R),
Int.
atomize(TokenChars) ->
list_to_atom(TokenChars).
More information about the erlang-questions
mailing list