[erlang-questions] erl_scan issues

Wed Apr 22 09:13:56 CEST 2015

Thank you Richard for the historical perspective.

On Wed, Apr 22, 2015 at 2:35 AM, Richard A. O'Keefe <ok@REDACTED>
wrote:

> Erlang syntax is adapted from Prolog syntax.
> It is traditional in Prolog parsers to distinguish
> between a "." token such as you might find in
> a.b.[] (the really old-fashioned way to write a list)
> and a ". " token which ends a clause.
> And it turns out that erl_scan:string/3 makes exactly
> the same distinction:  "a. b" contains a ". " (dot) token
> while "a.b" contains a "." ('.') token.  Now a full stop
> at the end of a string is also a dot token, but it has
> text ".".

And a ".%" input also creates a dot token with a "." text.

> White space as such is never a token.
>

It is if the scanner receives a 'return_whitespace" option. This is needed
(together with the 'text' option) if one must be able to recreate the
original source exactly as it was.

> This does not look like a bug at all to me.
>

I can agree that it is a stretch to call it a bug, but it should be better
specified. It is unexpected that some whitespace, especially newline, is
part of the 'dot' token. I would have left it separate and had a special
case for the opposite operation, i.e. add a whitespace after a 'dot' when
reconstructing the source (if the token list doesn't already include
whitespace tokens).

best regards,
Vlad

> I will say that it would be nice if the
> http://www.erlang.org/doc/man/erl_scan.html
> page contained or linked to an explicit statement
> of what the tokens ARE.
>
> At a minimum, the type category() should be a bit
> more explicit than "atom()".
>
> For that matter,
> http://www.erlang.org/doc/man/erl_parse.html
> should contain or link to an explicit statement
> of what the grammar is.
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20150422/0d05f753/attachment.htm>