[erlang-questions] A proposal for Unicode variable and atom names in Erlang.

Tue Oct 23 11:20:19 CEST 2012

On Oct 19, 2012, at 8:06 AM, Richard O'Keefe <ok@REDACTED> wrote:

> If it were still possible to submit EEPs in plain text,
> this would be an EEP.  If someone else would like to
> package this up as an EEP and submit it (under their
> name, mine, or both), feel free.
> 

[…] Snip the rest of the EEP proposal.

So now, I've taken the time to read through the proposal. In general I like it since it seems to be a conservative extension to what we already have. Yet, there are two points which I would like to have your opinion on:

Google Go takes two stances differently:

* There is *no* normalization. This means that you can write the same symbol using one codepoint or with two code points combining into the same representation. Of course this is the conservative stance where it is expected that people do not do silly things. But my guess is that it is much easier to handle. Is there a specific reason to pick normalization, apart from the obvious one? I see some similarities to tabs vs spaces for indentation here.

* In Go, identifiers are exported if they begin with a codepoint in class Lu. This is also a very conservative stance since now your programs must use an Lu codepoint for variable names if we just ported that solution to Erlang. But it is quite simple again, and very easy to handle from a parser perspective.

I am not saying that the proposal is bad, mind you. I am just trying to get an opinion on the above two stances. I do feel the Pc class is an elegant way of handling backwards compatibility while still allowing some slack going forward.

Jesper Louis Andersen
  Erlang Solutions Ltd., Copenhagen