[erlang-questions] Leex does not support ^ and $ in regexps, is there a workaround?

Metin Akat <>
Mon Oct 3 16:55:58 CEST 2016


Ah, right, now I get what you mean. Do some preprocessing between lexing
and parsing. Yes, that way I think it'll work. Thanks!

On Mon, Oct 3, 2016 at 5:46 PM, Jesper Louis Andersen <
> wrote:

> Just parse it as {id, "P"} and then use the parser to figure out if this
> is valid in that position. In the format you may want to keep the newlines
> explicit in the tokenized output since it seems to be significant. In some
> programs you have a set of valid keywords, in which case you can write a
> function:
>
> keyword("P") -> {cmd, "P"};
> keyword("U") -> {cmd, "U"};
> ...
> keyword(ID) -> {id, ID}.
>
> but note that this means that P and U are really only occurring in the
> input as special markers and have no other way to occur. You often use this
> in the situation where you have a construction such as 'if...then...else'
> in a typical programming language: you want those parsed specially, not as
> general identifiers (i.e., variables and other stuff).
>
>
>
> On Mon, Oct 3, 2016 at 4:37 PM Metin Akat <> wrote:
>
>> [{cmd, "P"}, {int, 2015}, '/', {int, 11}, '/', {int, 21}, {int, 2}, ':',
>> ...
>>  {id, "USD"}, {float, 1.1}, {id, "EUR}]
>>
>> In this case, how does your lexer know to parse the "P" to {cmd, "P} and
>> the "EUR" to {id, "EUR"}? The only way I can think of is to check if the P
>> is in the beginning of the line (which would totally suffice)
>>
>> Otherwise, yeah... if I am to write my own lexer... then I guess my whole
>> question is pointless.
>>
>> On Mon, Oct 3, 2016 at 5:24 PM, Jesper Louis Andersen <
>> > wrote:
>>
>>
>>
>> On Mon, Oct 3, 2016 at 10:34 AM Metin Akat <> wrote:
>>
>>
>>
>> P 2015/11/21 02:18:02 USD 1.1 EUR
>>
>>
>> So my question is: How do I tackle this? Do I just accept "P" as a WORD
>> token and somehow instruct yecc to parse based on the WORD's value? Is it
>> even possible to do?
>>
>>
>> (This is loosely from memory)
>>
>> The reason ^ and $ are not implemented is because they are never needed
>> in an LALR(1) parser/scanner construction. We want the above line to be
>> scanned into
>>
>> [{cmd, "P"}, {int, 2015}, '/', {int, 11}, '/', {int, 21}, {int, 2}, ':',
>> ...
>>  {id, "USD"}, {float, 1.1}, {id, "EUR}]
>>
>> Then we can define a yecc-grammar which can turn these into meaningful
>> constructions:
>>
>> Command -> Cmd Date Time Currency Amount Currency
>>   : {command, $1, $2, $3, {$4, $5, $5}}.
>>
>> Date -> Year '/' Month '/' Date : {$1, $3, $5}.
>> Year -> int : $1.
>> Month -> int : $1.
>> ...
>>
>> Sometimes, the indentation in the file does matter. But then it can be
>> smarter to code the lexer by hand or pre-pass over the input file and
>> insert markers for newlines etc. In other words, give structure to the
>> input before actually parsing it. This is used in many languages which uses
>> indentation-based-scope: a pre-pass inserts the scope markers based on
>> newlines and indentation. Then the scanner takes over and handles the
>> stream which has structure.
>>
>>
>>
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20161003/0a1280d5/attachment.html>


More information about the erlang-questions mailing list