[erlang-questions] A proposal for Unicode variable and atom names in Erlang.
Richard O'Keefe
ok@REDACTED
Tue Oct 30 05:18:19 CET 2012
On 22/10/2012, at 8:33 PM, Michael Uvarov wrote:
> What is the problem about unicode variables is that some characters
> are not equal: Х != X, but they look the same.
This would be a persuasive argument IF
(a) we did not already allow both XO and X0, Xl and X1, and so on;
(b) mixed scripts in a single token were plausible.
Neither is the case.
> Other problem about unicode is that a lot of algorithms are
> locale-based and difficult (a lot of rules and exceptions).
None of those algorithms applies to the current topic,
except for normalisation, which is not locale-based.
>
> Even non-locale based (unified and simple version of to_lower) contains this:
> - Contains additional case mappings that map to more than one
> character, such as "ß" to "SS".
That already applies to Latin-1, which Erlang supports RIGHT NOW.
(Nit-pick: that's an example of to_upper.)
> - Characters may have case mappings that depend on the locale.
> For example, in Turkish the letter U+0049 "I" capital letter i
> lowercases to U+0131 "ı" small dotless i.
Indeed. But since neither variable names nor unquoted atoms are
subjected to any kind of case mapping by the Erlang parser, how
is that relevant _here_?
You're mainly talking about problems with Unicode *data*, and
we don't have any option about dealing with those.
More information about the erlang-questions
mailing list