[erlang-questions] Atom Unicode Support

Wed Feb 3 14:20:31 CET 2016

On 02/03, Max Lapshin wrote:
>This is one single atom:    my package
>It is not 2 words, it is a single word that has non-breakable space inside
>it. Good luck for debugging =)

I believe the non-breakable space character is still in the space, 
separator [Zs] character class and should ideally be handled as such by 
a utf8-aware compiler. So the same way you'd need to type 'my package' 
(with a regular space), you'd need to type 'my package' (with a nbsp).

If this isn't respected, I'd probably expect this to be a language 
problem, not a unicode issue.

I believe Erlang 17 and earlier would complain about invalid syntax 
there. Starting in 18, such characters are seen as valid spaces in a 
program and just go through directly the way a regular space does.

>
>Of course you may say me: hire programmer that makes such things. Ok, no
>problems. But what to do with copy-paste from skype/slack, where such
>symbols are translated into nice utf8 automatically?
>

Because I am a French-speaking user and non-breakable spaces have their 
place in regular usage. For example, : takes a leading narrow 
non-breakable space, and that space must be there while keeping the 
punctiation mark on the same line as its leading word.

I have my editor set to hilight such leading spaces with a special 
character:

    set list listchars=tab:»·,trail:·,nbsp:·

All tabs, trailing spaces, and nonbreakable white space characters will 
show up in text. so 'my package' actually shows up as 'my·package' here, 
with some hilight color to make sure it's not just the literal '·'

So assuming the code does not work properly, and that you are one of 
these programmers working with these characters on a day-to-day basis, 
there are still ways to work around it without confusion.

That of course ignores specially crafted code built with the sole 
intention of confusing people (such as using the greek Α rather than the 
[whatever your locale] A in function or variable identifiers).

>It is very good that we all have about 80-90 symbols to write code that
>other people understand, but I really don't understand what is the profit
>of adding ability to make code non-understandable by people from other
>cultures.

You make the assumption that without unicode, Japanese programmers would 
write code in English rather than transliterating it in a latin alphabet 
(say with ISO 3602) for example. This doesn't happen if the programmer 
does not know English, or if their target audience (coworkers for 
example) do not speak English. They just find a very annoying workaround 
to get their meaning across in the language they feel they should use.

The reality is that if people feel like writing code in their own 
language, they will do so. If I'm writing code about an ATM in French, I 
might use the word 'guichet_automatique' or 'gab' instead of 'atm'. You 
would still be lost with a latin alphabet and lose all meaning. And the 
comments may very well be in French too, since they'd be to the 
attention of French speakers.

So the benefit is that people can write code in their own native 
language unhindered, and it won't change anything to your comprehension 
of code because you likely wouldn't understand it anyway, or wouldn't be 
the target audience of said code and will in all likelihood just not see 
it in the first place.

Regards,
Fred.