[erlang-questions] A proposal for Unicode variable and atom names in Erlang.

Michael Uvarov freeakk@REDACTED
Mon Oct 22 09:33:40 CEST 2012


What is the problem about unicode variables is that some characters
are not equal: Х != X, but they look the same.
Other problem about unicode is that a lot of algorithms are
locale-based and difficult (a lot of rules and exceptions).

Even non-locale based (unified and simple version of to_lower) contains this:
- Contains additional case mappings that map to more than one
character, such as "ß" to "SS".
But this case is save for variable names. The next case is more
interesting. It is from the locale-based version:
- Characters may have case mappings that depend on the locale.
  For example, in Turkish the letter U+0049 "I" capital letter i
lowercases to U+0131 "ı" small dotless i.

For example, the full version of the toLower function from ICU and its
dataset is described somewhere here:
http://www.unicode.org/reports/tr35/



More information about the erlang-questions mailing list