[erlang-questions] Atom Unicode Support
Fred Hebert
mononcqc@REDACTED
Wed Feb 3 14:20:31 CET 2016
On 02/03, Max Lapshin wrote:
>This is one single atom: my package
>It is not 2 words, it is a single word that has non-breakable space inside
>it. Good luck for debugging =)
I believe the non-breakable space character is still in the space,
separator [Zs] character class and should ideally be handled as such by
a utf8-aware compiler. So the same way you'd need to type 'my package'
(with a regular space), you'd need to type 'my package' (with a nbsp).
If this isn't respected, I'd probably expect this to be a language
problem, not a unicode issue.
I believe Erlang 17 and earlier would complain about invalid syntax
there. Starting in 18, such characters are seen as valid spaces in a
program and just go through directly the way a regular space does.
>
>Of course you may say me: hire programmer that makes such things. Ok, no
>problems. But what to do with copy-paste from skype/slack, where such
>symbols are translated into nice utf8 automatically?
>
Because I am a French-speaking user and non-breakable spaces have their
place in regular usage. For example, : takes a leading narrow
non-breakable space, and that space must be there while keeping the
punctiation mark on the same line as its leading word.
I have my editor set to hilight such leading spaces with a special
character:
set list listchars=tab:»·,trail:·,nbsp:·
All tabs, trailing spaces, and nonbreakable white space characters will
show up in text. so 'my package' actually shows up as 'my·package' here,
with some hilight color to make sure it's not just the literal '·'
So assuming the code does not work properly, and that you are one of
these programmers working with these characters on a day-to-day basis,
there are still ways to work around it without confusion.
That of course ignores specially crafted code built with the sole
intention of confusing people (such as using the greek Α rather than the
[whatever your locale] A in function or variable identifiers).
>It is very good that we all have about 80-90 symbols to write code that
>other people understand, but I really don't understand what is the profit
>of adding ability to make code non-understandable by people from other
>cultures.
You make the assumption that without unicode, Japanese programmers would
write code in English rather than transliterating it in a latin alphabet
(say with ISO 3602) for example. This doesn't happen if the programmer
does not know English, or if their target audience (coworkers for
example) do not speak English. They just find a very annoying workaround
to get their meaning across in the language they feel they should use.
The reality is that if people feel like writing code in their own
language, they will do so. If I'm writing code about an ATM in French, I
might use the word 'guichet_automatique' or 'gab' instead of 'atm'. You
would still be lost with a latin alphabet and lose all meaning. And the
comments may very well be in French too, since they'd be to the
attention of French speakers.
So the benefit is that people can write code in their own native
language unhindered, and it won't change anything to your comprehension
of code because you likely wouldn't understand it anyway, or wouldn't be
the target audience of said code and will in all likelihood just not see
it in the first place.
Regards,
Fred.
More information about the erlang-questions
mailing list