[erlang-questions] A proposal for Unicode variable and atom names in Erlang.

Tue Oct 23 12:43:53 CEST 2012

On Tue, Oct 23, 2012 at 11:20 AM, Jesper Louis Andersen
<jesper.louis.andersen@REDACTED> wrote:

> Google Go takes two stances differently:
>
> * There is *no* normalization. This means that you can write the same symbol using one codepoint or with two code points combining into the same representation. Of course this is the conservative stance where it is expected that people do not do silly things. But my guess is that it is much easier to handle. Is there a specific reason to pick normalization, apart from the obvious one? I see some similarities to tabs vs spaces for indentation here.

These are the obious reasons I can think of:

- It may not be easy for people to choose which normalization, or lack
of normalization is used by their preferred editor, or by their input
method.  A piece of code not written by me can be in a normalization
state different from the one used by my editor, and to check it I must
examine the text at byte level, or use a tool, and it may be
impossible to establish with certainty.

- It's just crazy to not normalize the source text of a program, in
any language.

- Better: it's crazy to have unicode text not normalized to a known
form, in any application which does more than pass around the text
untouched.

P.

-