[erlang-questions] EEP 40 - A proposal for Unicode variable and atom names in Erlang.

Sun Nov 4 22:24:30 CET 2012

On 2/11/2012, at 10:35 PM, Raimo Niskanen wrote:
> 
> Also I do not clearly see what problem is solved for someone using
> fonts with say Arabic letters but not say the undertine, by revising
> the underscore rule. Bear with me. I have never used another keyboard
> than Swedish or English. Is it so that when using such a font there
> is no Pc character available except for the "_" (and why is that
> available?) so there must be a possibility to express both non-singleton
> and maybe-singleton variables using just the "_"?

I have only tried the Macintosh interface, where there are
three "Arabic", "Arabic - PC", and "Arabic - QWERTY" virtual keyboards
available.  All of them have the underline.  ISO 8859-6 (the ISO 8-bit
character set for Arabic) includes all of ASCII.  However, I am not an
expert.
> 
> I have realized that. I wanted a lesser degree of understanding the
> lexical semantics: If it passes the compiler (which that example
> does not) I would like to be able to see which identifiers are
> variables and which are atoms.
> 
> Also, e.g someone writing a syntax highlighter for Vim i guess would
> appreciate a simple rule for how to recognize a variable.

Well, the EEP gives them _that_.  If Vim can highlight Ada and Python
and Java correctly, what's the problem?  Copy the regular expressions
it uses for Java and tinker with them.
> 
>> 
>> If someone gives you an Erlang file written entirely in ASCII,
>> but using the Klingon language, just how much would it help you
>> to know where the variables began?  (Google Translate offers
>> translation to Esperanto, why not Klingon?  I haven't opened my
>> copy of the how-to-learn-Klingon book in 20 years.  Sigh.)
> 
> It would not help much, I agree. But if for example I get a bug report
> about the compiler or runtime system not doing right for a few lines
> of Klingon Erlang, it would be helpful to easily distinguish variables
> from atoms.

You don't have to do it by eye.  You can use a tool (like the Vim
syntax colourer you mention above).

>> Consider
>> 1> X = a.B.
>> * 1: syntax error before: B
>> 1> X = a._2.
>> * 1: syntax error before: _2
>> 1> X = a.3.
>> * 1: syntax error before: 3
>> 1> X = a.b.
>> 'a.b'
>> 
>> That tells us that currently, only Ll characters are allowed
>> after a dot in the continuation of an identifier.  That naturally
>> generalised to (Ll ∪ Lo).  So I made "what can follow a dot" the
>> same everywhere in an atom.  The mental model I had was to think
>> of dot-followed-by-Ll-or-Lo as a single extended character.
> 
> Yes. And currently only Ll characters are allowed at the start
> of an atom. So currently the same set is allowed at the start
> as after a ".".
> 
> Your current suggestion allows a.ª as an unquoted atom since the character
> after the dot is in Lo, but it is not allowed in Erlang today.

Oh DRAT!
> 
> It also allows ᛮᛯᛰ as an atom but not ᛮᛯᛰ.ᛮᛯᛰ since these characters
> are in Nl (Letter_Number), which is part of XID_Start.

Frankly that one doesn't bother me in the least.
> 
> So I think the mental model should be that after a dot there
> should be as if a new atom was starting.

However, since I've got to fix the a.ª bug, I may as well adopt
your suggestion.  The grammar now reads

	unquoted_atom ::= "."? atom_start atom_continue*

        atom_start ::= XID_Start \ (Lu ∪ Lt ∪ "ªº")

	atom_continue ::= XID_Continue ∪ "@" \ "ªº"
	               |  "." atom_start