MicroLex grammar is a list of rules. Order is significant. If input matches few rules the first in the list is chosen.
Rules have three forms:
{Class, Regexp, FormatFun}
Regexp
is chosen as
token string{Class, Regexp, FormatFun, short}
{Class, Regexp1, '/', Regexp2, FormatFun}
Regexp1Regexp2
but only the part matched
Regexp1
is chosen as token string and buffer
position points to the next char after it.Class
Regexp
FormatFun
Fun(Class, Line, String) ->
{error, Error} | Token
Line
String
Regexp
Following simple grammar recognizes integers and floats
%% Grammar grammar() -> [{ws, ws(), ?skip}, {float_num, float_num(), fun yeec_token/3}, {integer_num, integer_num(), fun yeec_token/3}]. ws() -> ci(" \t\f"). %%Float %%(+|-)?[0-9]+\.[0-9]+((E|e)(+|-)?[0-9]+)? float_num() -> '@'([integer_num(), c($.),'+'(digit()),'?'('@'([ci("Ee"), integer_num()]))]). integer_num() -> '@'(['?'(ci("+-")), '+'(digit())]). digit() -> ci($0, $9). %% End of Grammar
MicroLex regexp | Lex analog |
---|---|
'@'([R1, R2, ...]) | R1R2... |
'|'([R1, R2, ...]) | R1|R2|... |
'*'(R) | R* |
'+'(R) | R+ |
'?'(R) | R? |
'.'() | . |
sol(R) | ^R |
eol(R) | R$ |
btw(From, To, R) | R{From, To} |
c($a) | a |
nc($a) | [^a] |
ci($a, $z) | [a-z] |
ci("abc") | [abc] |
cni($a, $z) | [^a-z] |
cni("abc") | [^abc] |
str("abba") | abba |
Scanner output is list of tokens. List ended with user defined end
token or $end
for yecc
compatibility.
Scanner can be used in batch mode when the whole input buffer is processed and list of tokens is returned and in continuation style.
If rule's format function returns list it is appended to the output
list. Any other result is added to output list. Format function can
return empty list []
if you don't want rule result to be
present in the output.
On syntax error scanner returns tuple {error, Error}
.
Error
scanError()
Exported Functions | |
---|---|
'*'/1 | Match zero or more appearances of regexp. |
'+'/1 | Match one or more appearances of regexp. |
'.'/0 | Match any character excluding new line. |
'?'/1 | Match one or zero appearances of regexp. |
'@'/1 | Regexps concatenation. |
btw/3 | Match from From to To appearances of regexp. |
c/1 | Match character C . |
ci/1 | Match any character in list. |
ci/2 | Match any character in range From-To. |
cni/1 | Match any character excluding chars in list. |
cni/2 | Match any character excluding chars in range From-To. |
eol/1 | Match regexp at the end of line. |
format_error/1 | |
grammar/1 | The same as grammar/2 but use default terminating
token '$end'. |
grammar/2 | Compile list of Rules to internal grammar
representation. |
match/1 | |
match/2 | |
nc/1 | Match any character excluding C . |
nmatch/1 | |
nmatch/2 | |
scan/3 | Scans whole Buffer and returns list of tokens or
error. |
scan_token/2 | The same as scan_token/3 but uses buffer returned
by scan_token/2 or scan_token/3 . |
scan_token/3 | Scans Buffer for the first recognized token
Must be called first. |
sol/1 | Match regexp at the start of line. |
str/1 | Match string. |
'|'/1 | Match any regexp from list. |
LineNum
Char
Expect
Str
It could be formatted to user friendly string with format_error/1.
*(Node::function()) -> function()
Match zero or more appearances of regexp
+(Node::function()) -> function()
Match one or more appearances of regexp
.() -> function()
Match any character excluding new line.
?(Node::function()) -> function()
Match one or zero appearances of regexp
@(Nodes::list()) -> function()
Regexps concatenation
btw(From, To, Node::function()) -> function()
Match from From to To appearances of regexp
c(C::char()) -> function()
Match character C
ci(Str::string()) -> function()
Match any character in list
ci(From::char(), To::char()) -> function()
Match any character in range From-To
cni(Str::string()) -> function()
Match any character excluding chars in list
cni(From::char(), To::char()) -> function()
Match any character excluding chars in range From-To
eol(Node::function) -> function()
Match regexp at the end of line
format_error(Arg1) -> term()
grammar(Rules::list()) -> grammar()
The same as grammar/2
but use default terminating
token '$end'.
See also: grammar/2
.
grammar(Rules::list(), EndToken::term()) -> grammar()
Compile list of Rules
to internal grammar
representation
match(Arg1) -> term()
match(Arg1, Arg2) -> term()
nc(C::char()) -> function()
Match any character excluding C
nmatch(Arg1) -> term()
nmatch(Arg1, Arg2) -> term()
scan(ModBuffer::atom(), Buffer::term(), Grammar::grammar()) -> TokenList | {error, Error}
Scans whole Buffer
and returns list of tokens or
error. See mlex_str_buf.erl
for buffer module
example. ModBuffer
is a buffer module name which
operates on Buffer
.
See also: grammar/1
, grammar/2
.
scan_token(Buf::term(), Grammar::grammar()) -> {eof, Cont} | {ok, Cont} | {error, Error}
The same as scan_token/3
but uses buffer returned
by scan_token/2
or scan_token/3
See also: scan/3
, scan_token/3
.
scan_token(ModBuffer::atom(), Buffer::term(), Grammar::grammar()) -> {eof, Cont} | {ok, Cont} | {error, Error}
Scans Buffer
for the first recognized token
Must be called first. Next calls must use scan_token/2
See also: scan/3
, scan_token/3
.
sol(Node::function) -> function()
Match regexp at the start of line
str(Str::string()) -> function()
Match string
|(Nodes::list()) -> function()
Match any regexp from list