Language change proposal

Mon Nov 3 03:43:57 CET 2003

=?ISO-8859-1?Q?H=E5kan_Stenholm?= <hakan.stenholm@REDACTED> wrote:
	> Unicode also has issues with letter case.

	Isn't this really a kind of design error/bug/feature in erlang ?

No.  Erlang requires that some characters be classified as upper-case
letters (I'd include title-case letters in that set) and some other
characters be classified as not-upper-case letters (include lower case,
non-case, syllables, logograms).  The upper case letters should contain
the 26 ASCII ones; the not-upper-case-letters should contain the 26 ASCII
ones; the two sets should be disjoint; various other characters (digits,
layout, punctuation) should also be disjoint.  Works fine for Unicode.
This was all sorted out for the ISO Prolog standard.

	While I personally would prefer code to be written in english I don't 
	see any real problems with using Unicode. The simplest way would 
	probably be to introduce some kind of standard upper case marker 
	(character) in the case that there is no upper case version of a 
	character.

Erlang syntax doesn't *care* whether there is an upper case version of a
character or not.  People writing in Chinese, Japenese, Korean, &c should
start their variables with an "_" (like people writing Prolog for those
languages); that's enough.  The problem was first considered for Prolog
back in about 1983, as far as I know; Quintus implemented this solution
(variable starts with any upper case letter; if your script doesn't have
upper case letters, use a leading "_") by about 1985 or 1986.

	Another somewhat more confusing choice would be to require that
	functions can only start with upper case Unicode letters
	(possibly only the characters supplied in the current erlang
	character set).

That would certainly be confusing, since Erlang function names normally
start with not-upper-case letters.

By the way, the Unicode book spells out clear, simple, and usable rules
for identifier syntax.  I wish people would read that before trying to
solve problems that don't actually exist.  (There are more than enough
Unicode problems that _do_ exist...)

	[Unicode]	
	might also be useful in comments, if they aren't written in english 
	- japanese, russian and other languages that have completely different 
	character sets will be rather tedious to encode in some kind of 
	ASCII/latin1 version.

Heck, IBM mainframe programmers have been able to use wide characters
in strings and comments for at least 20 years.  In Fortran, yet.  My
point is that IF you are going to do this, you had better say up front
with a -erlang(Encoding,Version) declaration, which character set you are
using in those comments and strings, lest they be misunderstood.