[erlang-bugs] R16 breaks dots
Richard Carlsson
carlsson.richard@REDACTED
Sat Mar 30 23:53:43 CET 2013
On 2013-03-30 10:42, Anthony Ramine wrote:
> I do want to know why dots aren't allowed in atoms anymore
> and would like to see them back too.
As Fred already mentioned, this feature was added as part of the
"packages" and was removed along with them.
> It was pretty useful to be able to write unquoted fully-qualified
> node names in the prompt, e.g. foo@REDACTED
I think that many agree on this, and maybe the OTP team can be convinced
to take this part back. It should be pretty simple to extract the
relevant code from the commit that removes packages.
> Furthermore, it feels to me like their removal was a mistake, as
> demonstrated by this:
>
> 1> foo.bar. * 1: syntax error before: '.' 1> foo. bar. foo 2> bar.
> bar
>
> What you can see here is that the blanks after a dot are still
> mandatory to properly parse a '.' character as a 'dot' token,
> terminating an expression in the shell (or a form in a module), this
> was mandatory to distinguish dot terminators from dots in atoms.
>
> If dots are really to not be allowed anymore in atoms, the blanks
> should be made optional, to be consistent with the rest of the
> language where blanks are optional before or after a symbol (with the
> notable exception of a match '=' followed by a binary literal
> '<<...>>').
This is not quite how the grammar works. First of all, the 'dot' token
is identified as a "." followed by whitespace or a comment or EOF, and
the packages addition did not change that. However, periods that are not
a dot token or part of any other token are seen as '.' tokens. For example:
1> erl_scan:string("foo.bar. ").
{ok,[{atom,1,foo},{'.',1},{atom,1,bar},{dot,1}],1}
2> erl_scan:string("foo. bar. ").
{ok,[{atom,1,foo},{dot,1},{atom,1,bar},{dot,1}],1}
Now, the Erlang parser works on complete "forms" at a time - these are
the token sequences that are terminated by dot tokens. In the first
case, you have one form containing three tokens. In the second case, you
have two forms containing one token each. Blanks cannot be made optional
after periods, because you must be able to distinguish between token
sequences like these.
It's also the case that you can't just change the scanning of atoms to
allow periods as part of the atom token - in that case, the scanner
would report a single atom for "foo.bar" instead of three tokens 'foo'
'.' 'bar', and then the grammar would not be able to identify phrases
like "Rec#foo.bar" or "#foo.bar". To support dotted atoms, the packages
added a grammar rule that allowed a seqence <atom> '.' ... <atom> to be
merged into a single atom unless it was part of another rule such as '#'
<atom> '.' <atom>. (I think that Haskell had to do some similar tricks
with their grammar to allow dotted names.) This could easily be put back
in there. But at no point has it been the case in Erlang that unquoted
atom tokens could contain periods.
/Richard
More information about the erlang-bugs
mailing list