[erlang-questions] Leex does not support ^ and $ in regexps, is there a workaround?

Jesper Louis Andersen jesper.louis.andersen@REDACTED
Mon Oct 3 16:24:12 CEST 2016


On Mon, Oct 3, 2016 at 10:34 AM Metin Akat <akat.metin@REDACTED> wrote:

>
>
> P 2015/11/21 02:18:02 USD 1.1 EUR
>
>
> So my question is: How do I tackle this? Do I just accept "P" as a WORD
> token and somehow instruct yecc to parse based on the WORD's value? Is it
> even possible to do?
>
>
(This is loosely from memory)

The reason ^ and $ are not implemented is because they are never needed in
an LALR(1) parser/scanner construction. We want the above line to be
scanned into

[{cmd, "P"}, {int, 2015}, '/', {int, 11}, '/', {int, 21}, {int, 2}, ':', ...
 {id, "USD"}, {float, 1.1}, {id, "EUR}]

Then we can define a yecc-grammar which can turn these into meaningful
constructions:

Command -> Cmd Date Time Currency Amount Currency
  : {command, $1, $2, $3, {$4, $5, $5}}.

Date -> Year '/' Month '/' Date : {$1, $3, $5}.
Year -> int : $1.
Month -> int : $1.
...

Sometimes, the indentation in the file does matter. But then it can be
smarter to code the lexer by hand or pre-pass over the input file and
insert markers for newlines etc. In other words, give structure to the
input before actually parsing it. This is used in many languages which uses
indentation-based-scope: a pre-pass inserts the scope markers based on
newlines and indentation. Then the scanner takes over and handles the
stream which has structure.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20161003/009da0da/attachment.htm>


More information about the erlang-questions mailing list