Language change proposal

Fri Oct 31 11:38:46 CET 2003

Richard A. O'Keefe wrote:
>     One of the long term goals for Erlang is that it should support Unicode;

This is something that I'd advise against.
True, it would be nice to be able to write your source code using 
native-language identifiers witout having to worry about ASCII 
representation.
However, there are two problems here:

1) If somebody gives me software to maintain, I might hit a, say, 
Chinese glyph somewhere. I'd have to download the proper font just to be 
able to look at the sources.
I have programmed in Java, which also uses Unicode. I tend to avoid the 
German special characters ÄÖÜäöüß even if I program in German; I use 
their transcriptions AE OE UE ae oe ue ss instead.

2) There are many glyphs that look the same. For example, that "a" 
letter might actually have an entirely different encoding since it's 
from the Russian alphabet.

Unicode also has issues with letter case.
For one, there is no good mapping of lowercase and uppercase letters 
(and cannot be: for example, the German ß has no uppercase equivalent, 
it transliterates to SS or SZ depending on personal whim).
Additionally, Unicode has /three/ lettercase categories: lower, upper, 
and title case. (The latter information is gleaned from the Haskell 
language report, I don't know anything further about Unicode.)

(There's also a portability issue: there are still EBCDIC machines 
around that don't support Unicode. I don't think this is relevant for 
Erlang though *g*)

My personal idea about Unicode is that it is massively overengineered 
for simple tasks like representing source code.
With one exception: it would be very nice if the language allowed 
Unicode within string literals. That's more a question of how to 
integrate binary data into source code well.

>     many existing Erlang users already combine Erlang and XML (hence xmerl,
>     amongst others), and XML requires Unicoode.  If you are going to process
>     XML in Erlang, it is helpful if you can represent XML identifiers as
>     Erlang atoms, and at a minimum you need to be able to hold Unicode
>     characters in strings.
>     I'd be happy with ISO Latin 1 as the default encoding, but I suppose
>     Windows programmers wouldn't be, and in today's Europe, it could be
>     useful to be able to mention the Euro, so 8859-15 might be a good
>     choice; need I go on?

What are the advantages of keeping some XML data as atoms? I would have 
thought that, ultimately, they are just strings, and all techniques that 
apply to atoms should apply to strings as well.

About ISO Latin and Windows: That's one of the reasons why I don't use 
umlauts in my source code, except when it comes to literal strings.
And I'm painfully aware that having umlauts in strings makes my sources 
nonportable; the better solution is to have some internationalization 
support.

Regards,
Jo