[erlang-questions] Leex does not support ^ and $ in regexps, is there a workaround?

Jesper Louis Andersen jesper.louis.andersen@REDACTED
Mon Oct 3 16:46:06 CEST 2016


Just parse it as {id, "P"} and then use the parser to figure out if this is
valid in that position. In the format you may want to keep the newlines
explicit in the tokenized output since it seems to be significant. In some
programs you have a set of valid keywords, in which case you can write a
function:

keyword("P") -> {cmd, "P"};
keyword("U") -> {cmd, "U"};
...
keyword(ID) -> {id, ID}.

but note that this means that P and U are really only occurring in the
input as special markers and have no other way to occur. You often use this
in the situation where you have a construction such as 'if...then...else'
in a typical programming language: you want those parsed specially, not as
general identifiers (i.e., variables and other stuff).



On Mon, Oct 3, 2016 at 4:37 PM Metin Akat <akat.metin@REDACTED> wrote:

> [{cmd, "P"}, {int, 2015}, '/', {int, 11}, '/', {int, 21}, {int, 2}, ':',
> ...
>  {id, "USD"}, {float, 1.1}, {id, "EUR}]
>
> In this case, how does your lexer know to parse the "P" to {cmd, "P} and
> the "EUR" to {id, "EUR"}? The only way I can think of is to check if the P
> is in the beginning of the line (which would totally suffice)
>
> Otherwise, yeah... if I am to write my own lexer... then I guess my whole
> question is pointless.
>
> On Mon, Oct 3, 2016 at 5:24 PM, Jesper Louis Andersen <
> jesper.louis.andersen@REDACTED> wrote:
>
>
>
> On Mon, Oct 3, 2016 at 10:34 AM Metin Akat <akat.metin@REDACTED> wrote:
>
>
>
> P 2015/11/21 02:18:02 USD 1.1 EUR
>
>
> So my question is: How do I tackle this? Do I just accept "P" as a WORD
> token and somehow instruct yecc to parse based on the WORD's value? Is it
> even possible to do?
>
>
> (This is loosely from memory)
>
> The reason ^ and $ are not implemented is because they are never needed in
> an LALR(1) parser/scanner construction. We want the above line to be
> scanned into
>
> [{cmd, "P"}, {int, 2015}, '/', {int, 11}, '/', {int, 21}, {int, 2}, ':',
> ...
>  {id, "USD"}, {float, 1.1}, {id, "EUR}]
>
> Then we can define a yecc-grammar which can turn these into meaningful
> constructions:
>
> Command -> Cmd Date Time Currency Amount Currency
>   : {command, $1, $2, $3, {$4, $5, $5}}.
>
> Date -> Year '/' Month '/' Date : {$1, $3, $5}.
> Year -> int : $1.
> Month -> int : $1.
> ...
>
> Sometimes, the indentation in the file does matter. But then it can be
> smarter to code the lexer by hand or pre-pass over the input file and
> insert markers for newlines etc. In other words, give structure to the
> input before actually parsing it. This is used in many languages which uses
> indentation-based-scope: a pre-pass inserts the scope markers based on
> newlines and indentation. Then the scanner takes over and handles the
> stream which has structure.
>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20161003/d9cc1084/attachment.htm>


More information about the erlang-questions mailing list