Language change proposal
Fri Oct 31 11:38:46 CET 2003
Richard A. O'Keefe wrote:
> One of the long term goals for Erlang is that it should support Unicode;
This is something that I'd advise against.
True, it would be nice to be able to write your source code using
native-language identifiers witout having to worry about ASCII
However, there are two problems here:
1) If somebody gives me software to maintain, I might hit a, say,
Chinese glyph somewhere. I'd have to download the proper font just to be
able to look at the sources.
I have programmed in Java, which also uses Unicode. I tend to avoid the
German special characters ÄÖÜäöüß even if I program in German; I use
their transcriptions AE OE UE ae oe ue ss instead.
2) There are many glyphs that look the same. For example, that "a"
letter might actually have an entirely different encoding since it's
from the Russian alphabet.
Unicode also has issues with letter case.
For one, there is no good mapping of lowercase and uppercase letters
(and cannot be: for example, the German ß has no uppercase equivalent,
it transliterates to SS or SZ depending on personal whim).
Additionally, Unicode has /three/ lettercase categories: lower, upper,
and title case. (The latter information is gleaned from the Haskell
language report, I don't know anything further about Unicode.)
(There's also a portability issue: there are still EBCDIC machines
around that don't support Unicode. I don't think this is relevant for
Erlang though *g*)
My personal idea about Unicode is that it is massively overengineered
for simple tasks like representing source code.
With one exception: it would be very nice if the language allowed
Unicode within string literals. That's more a question of how to
integrate binary data into source code well.
> many existing Erlang users already combine Erlang and XML (hence xmerl,
> amongst others), and XML requires Unicoode. If you are going to process
> XML in Erlang, it is helpful if you can represent XML identifiers as
> Erlang atoms, and at a minimum you need to be able to hold Unicode
> characters in strings.
> I'd be happy with ISO Latin 1 as the default encoding, but I suppose
> Windows programmers wouldn't be, and in today's Europe, it could be
> useful to be able to mention the Euro, so 8859-15 might be a good
> choice; need I go on?
What are the advantages of keeping some XML data as atoms? I would have
thought that, ultimately, they are just strings, and all techniques that
apply to atoms should apply to strings as well.
About ISO Latin and Windows: That's one of the reasons why I don't use
umlauts in my source code, except when it comes to literal strings.
And I'm painfully aware that having umlauts in strings makes my sources
nonportable; the better solution is to have some internationalization
More information about the erlang-questions