[erlang-questions] A proposal for Unicode variable and atom names in Erlang.

Tue Oct 30 05:18:19 CET 2012

On 22/10/2012, at 8:33 PM, Michael Uvarov wrote:

> What is the problem about unicode variables is that some characters
> are not equal: Х != X, but they look the same.

This would be a persuasive argument IF
(a) we did not already allow both XO and X0, Xl and X1, and so on;
(b) mixed scripts in a single token were plausible.
Neither is the case.

> Other problem about unicode is that a lot of algorithms are
> locale-based and difficult (a lot of rules and exceptions).

None of those algorithms applies to the current topic,
except for normalisation, which is not locale-based.
> 
> Even non-locale based (unified and simple version of to_lower) contains this:
> - Contains additional case mappings that map to more than one
> character, such as "ß" to "SS".

That already applies to Latin-1, which Erlang supports RIGHT NOW.
(Nit-pick: that's an example of to_upper.)

> - Characters may have case mappings that depend on the locale.
>  For example, in Turkish the letter U+0049 "I" capital letter i
> lowercases to U+0131 "ı" small dotless i.

Indeed.  But since neither variable names nor unquoted atoms are
subjected to any kind of case mapping by the Erlang parser, how
is that relevant _here_?

You're mainly talking about problems with Unicode *data*, and
we don't have any option about dealing with those.