ANN: MicroLex - simple lexical scanner
Vladimir Sekissov
svg@REDACTED
Mon Sep 2 22:02:47 CEST 2002
Good day,
MicroLex is a simple DFA based lexical scanner. I decided to post it
to list in hope to get suggestions and comments. If somebody find it
useful I 'm ready to post it to Erlang user contributions list.
IMicroLex supports mostly all frequently used lex regexps, predictive
operator, long (default) and short regexps.
* Grammar
MicroLex grammar is a list of rules. Order is significant. If input
matches few rules the first in the list is chosen.
* Rules
Rules have three forms:
** {Class, Regexp, FormatFun}
Class - token class
Regexp - regular expression
FormatFun = Fun(Class, Line, String)
Line - current line in input stream
String - longest string matched Regexp
** {Class, Regexp, FormatFun, short}
The same as above but the shortest string matched Regexp is chosen.
** {Class, Regexp1, '/', Regexp2, FormatFun}
Predictive operator. Input matches Regexp1Regexp2 but only the part
matched Regexp1 is chosen as token string and buffer position points to
the next char after it.
* Grammar Example
Following simple grammar recognizes integers and floats
%% Grammar
grammar() ->
[{ws, ws(), ?skip},
{float_num, float_num(), fun yeec_token/3},
{integer_num, integer_num(), fun yeec_token/3}].
ws() -> ci(" \t\f").
%%Float
%%(+|-)?[0-9]+\.[0-9]+((E|e)(+|-)?[0-9]+)?
float_num() ->
'@'([integer_num(), c($.),'+'(digit()),'?'('@'([ci("Ee"), integer_num()]))]).
integer_num() ->
'@'(['?'(ci("+-")), '+'(digit())]).
digit() ->
ci($0, $9).
%% End of Grammar
* Regexps
MicroLex regexp Lex analog
'@'([R1, R2, ...]) R1R2...
'|'([R1, R2, ...]) R1|R2|...
'*'(R) R*
'+'(R) R+
'?'(R) R?
'.'() .
sol(R) ^R
eol(R) R$
btw(From, To, R) R{From, To}
c($a) a
nc($a) [^a]
ci($a, $z) [a-z]
ci("abc") [abc]
cni($a, $z) [^a-z]
cni("abc") [^abc]
str("abba") abba
* Scanner
Scanner can be used in batch mode when the whole input buffer is
processed and list of tokens is returned and in continuation style.
If rule's format function returns empty list it isn't included to the output.
** Errors
On syntax error scanner returns tuple {error, Error}, where
Error = {scan, LineNum, Char, Expect, Str}
LineNum - line number
Char - first unmatched character
Expect - character classes grammar expecting at that point
Str - last recognized characters
It could be formatted to user friendly string with format_error/1.
* Examples
There are two example grammars
mlex_asn1.erl - subset of ASN.1 grammar
mlex_freeradius_conf.erl - cshell-like configuration file grammar of
FreeRadius package
Test files are in 'priv/test' directory.
Look also at mlex_test.erl
Best Regards,
Vladimir Sekissov
-------------- next part --------------
A non-text attachment was scrubbed...
Name: mlex.tar.gz
Type: application/octet-stream
Size: 15182 bytes
Desc: not available
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20020903/4e961eb7/attachment.obj>
More information about the erlang-questions
mailing list