[erlang-questions] Atom Unicode Support

Tue Feb 2 08:11:20 CET 2016

On 2016年2月2日 火曜日 09:13:01 Max Lapshin wrote:
> What are the reasons to have unicode atoms?
> 
> To make code non-editable by foreign programmers? Are you ready to edit
> chineese or thai?

As someone who works mostly in Japanese on Japanese projects... I have to
admit, while there is a fair amount of lisp in Japanese, I can't think of
a practical use case where I would want to use unicode atoms in Erlang.

It would be just as odd as wanting to use capital-letter atoms in Erlang,
actually -- the quoting and upper/lower distinctions are actually quite
convenient. If I was going to use Japanese atoms, I would also want to use
Japanese variable names -- and I would then feel a very strong urge to
create the same distinction:
  - atoms are always katakana
  - variables are anything else that is a letter

But now we're really opening a can of worms. When I'm in Japanese-writing
mode I actually *expect* my input method to change things like '->' to
→, among *many* other cases of dramatically more accurate character
representations for things we use the pitiful range of "safe ASCII"
characters to draw in a clumsy way. I would also expect '<<' to become
« and wonder why we have to use #{} for a map and #name{} for a record
and {} for a tuple when we have 「」 and 【】 and 〔〕 and... (etc.) available.

As an actual daily user of UTF8 values that aren't just the tiny corner
used in the West, I feel like it opens the door to a lot more confusion
without being very careful about how it is implemented. It is, of course,
"easy" to just say "anything in ASCII single-quotes is an atom, and the
only way to no single-quote an atom is to use lower-case ASCII letters"
-- but in so doing you also guarantee that almost nobody will ever take
advantage of UTF8 atoms. This will make their use very rare, and so will
make their presence in a codebase somewhat annoying. (Prepare to see
comments on github commits like "Wtf, normal atoms aren't l33t enough for
this clown?!?")

I'm not saying its a bad idea or a bad thing to think about -- but a lot
more thought would need to go into this than simply defining a parsing rule
or cobbling together a patch that builds cleanly. Just like making snap
decisions about what "whitespace" means, this will be something we're stuck
with forever (all of us -- including people who program in languages that
have analogs to the upper/lower distinction).

-Craig