Language change proposal

Sat Nov 1 05:11:56 CET 2003

> Richard A. O'Keefe wrote:
>>     One of the long term goals for Erlang is that it should support 
>> Unicode;
>
> This is something that I'd advise against.
> True, it would be nice to be able to write your source code using 
> native-language identifiers witout having to worry about ASCII 
> representation.
> However, there are two problems here:

There is also the problem of mixing native language identifiers with 
the english ones from the OTP libs, which is bound to look rather odd 
and might possibly be confusing in some cases, where the words mean 
different things in each languages. It also limits the portability of 
the code as fewer people can understand it - imagine Linux written in 
finish.

>
> 1) If somebody gives me software to maintain, I might hit a, say, 
> Chinese glyph somewhere. I'd have to download the proper font just to 
> be able to look at the sources.

I might also be just a bit tricky to figure out how to write the 
glyph/s, if it's something like japanese, chinese or korean.

> I have programmed in Java, which also uses Unicode. I tend to avoid 
> the German special characters ÄÖÜäöüß even if I program in German; I 
> use their transcriptions AE OE UE ae oe ue ss instead.
>
> 2) There are many glyphs that look the same. For example, that "a" 
> letter might actually have an entirely different encoding since it's 
> from the Russian alphabet.
>
> Unicode also has issues with letter case.

Isn't this really a kind of design error/bug/feature in erlang ?
While I personally would prefer code to be written in english I don't 
see any real problems with using Unicode. The simplest way would 
probably be to introduce some kind of standard upper case marker 
(character) in the case that there is no upper case version of a 
character. Another somewhat more confusing choice would be to require 
that functions can only start with upper case Unicode letters (possibly 
only the characters supplied in the current erlang character set).

> For one, there is no good mapping of lowercase and uppercase letters 
> (and cannot be: for example, the German ß has no uppercase equivalent, 
> it transliterates to SS or SZ depending on personal whim).
> Additionally, Unicode has /three/ lettercase categories: lower, upper, 
> and title case. (The latter information is gleaned from the Haskell 
> language report, I don't know anything further about Unicode.)
>
> (There's also a portability issue: there are still EBCDIC machines 
> around that don't support Unicode. I don't think this is relevant for 
> Erlang though *g*)
>
>
> My personal idea about Unicode is that it is massively overengineered 
> for simple tasks like representing source code.
> With one exception: it would be very nice if the language allowed 
> Unicode within string literals. That's more a question of how to 
> integrate binary data into source code well.

It might also be useful in comments, if they aren't written in english 
- japanese, russian and other languages that have completely different 
character sets will be rather tedious to encode in some kind of 
ASCII/latin1 version.