[erlang-questions] EEP 40 - A proposal for Unicode variable and atom names in Erlang.
Richard O'Keefe
ok@REDACTED
Sun Nov 4 22:24:30 CET 2012
On 2/11/2012, at 10:35 PM, Raimo Niskanen wrote:
>
> Also I do not clearly see what problem is solved for someone using
> fonts with say Arabic letters but not say the undertine, by revising
> the underscore rule. Bear with me. I have never used another keyboard
> than Swedish or English. Is it so that when using such a font there
> is no Pc character available except for the "_" (and why is that
> available?) so there must be a possibility to express both non-singleton
> and maybe-singleton variables using just the "_"?
I have only tried the Macintosh interface, where there are
three "Arabic", "Arabic - PC", and "Arabic - QWERTY" virtual keyboards
available. All of them have the underline. ISO 8859-6 (the ISO 8-bit
character set for Arabic) includes all of ASCII. However, I am not an
expert.
>
> I have realized that. I wanted a lesser degree of understanding the
> lexical semantics: If it passes the compiler (which that example
> does not) I would like to be able to see which identifiers are
> variables and which are atoms.
>
> Also, e.g someone writing a syntax highlighter for Vim i guess would
> appreciate a simple rule for how to recognize a variable.
Well, the EEP gives them _that_. If Vim can highlight Ada and Python
and Java correctly, what's the problem? Copy the regular expressions
it uses for Java and tinker with them.
>
>>
>> If someone gives you an Erlang file written entirely in ASCII,
>> but using the Klingon language, just how much would it help you
>> to know where the variables began? (Google Translate offers
>> translation to Esperanto, why not Klingon? I haven't opened my
>> copy of the how-to-learn-Klingon book in 20 years. Sigh.)
>
> It would not help much, I agree. But if for example I get a bug report
> about the compiler or runtime system not doing right for a few lines
> of Klingon Erlang, it would be helpful to easily distinguish variables
> from atoms.
You don't have to do it by eye. You can use a tool (like the Vim
syntax colourer you mention above).
>> Consider
>> 1> X = a.B.
>> * 1: syntax error before: B
>> 1> X = a._2.
>> * 1: syntax error before: _2
>> 1> X = a.3.
>> * 1: syntax error before: 3
>> 1> X = a.b.
>> 'a.b'
>>
>> That tells us that currently, only Ll characters are allowed
>> after a dot in the continuation of an identifier. That naturally
>> generalised to (Ll ∪ Lo). So I made "what can follow a dot" the
>> same everywhere in an atom. The mental model I had was to think
>> of dot-followed-by-Ll-or-Lo as a single extended character.
>
> Yes. And currently only Ll characters are allowed at the start
> of an atom. So currently the same set is allowed at the start
> as after a ".".
>
> Your current suggestion allows a.ª as an unquoted atom since the character
> after the dot is in Lo, but it is not allowed in Erlang today.
Oh DRAT!
>
> It also allows ᛮᛯᛰ as an atom but not ᛮᛯᛰ.ᛮᛯᛰ since these characters
> are in Nl (Letter_Number), which is part of XID_Start.
Frankly that one doesn't bother me in the least.
>
> So I think the mental model should be that after a dot there
> should be as if a new atom was starting.
However, since I've got to fix the a.ª bug, I may as well adopt
your suggestion. The grammar now reads
unquoted_atom ::= "."? atom_start atom_continue*
atom_start ::= XID_Start \ (Lu ∪ Lt ∪ "ªº")
atom_continue ::= XID_Continue ∪ "@" \ "ªº"
| "." atom_start
More information about the erlang-questions
mailing list