[erlang-questions] A proposal for Unicode variable and atom names in Erlang.

Richard O'Keefe ok@REDACTED
Mon Nov 5 06:49:29 CET 2012


On 5/11/2012, at 4:42 PM, Henning Diedrich wrote:

> 
> On Oct 30, 2012, at 5:33 AM, Richard O'Keefe <ok@REDACTED> wrote:
> 
>>> Here is an example:
>>> I want to write a module in Turkish, then the  "length" id will be a
>>> variable, not a function.
>> 
>> What on earth are you talking about?  Lower case l is a lower case
>> letter, whether you're writing English, Turkish, or Old High Martian.
> 
> 
> My point, maybe Michael's in way, too, was this:
> 
> 1> Iength = length.
> length
> 2> Ienght.
> * 1: variable 'Ienght' is unbound

That's because the last two letters were swapped.  There's nothing
here to do with Turkish.  (For that matter, while Turkish has an
extra dotted capital I İ and an extra dotless small i ı, it uses
the same dotless capital I that we do, it's just the capital of a
dotless small i.)

> 3> length = Iength.
> length
> 4> Iength.
> length
> 
> Fun factor depends on font you're using.

To quote a Pogo strip, "you have the wrong mistake".

I am reminded of a burglar indignantly protesting his
innocence:  "I didn't rob *THAT* house" (but don't ask
me about the one next door).

We *already* have confusable characters in Latin-1:
i/l/1, o/O/0 -- I'm seeing a slashed zero here and
very much wish I weren't because that's not how I was taught
to write a zero -- 2/Z, s/5, and if you had to read the handwriting
I'm reading during marking, you'd wonder if there were _any_ two
characters that couldn't be confused.  (There was a time when
Australian school-children were _taught_ to write unclosed
small "p" letters so they looked like long-tailed "r".  Why?)
So our burglar is saying "I don't have THOSE [Unicode] confusable
characters" (just don't ask me about all the others I do have).

If you are talking about the confusability of characters,
you could bring in CAPITAL A WITH RING ABOVE and ANGSTROM SIGN,
or for that matter the already mentioned Latin capital A,
Cyrillic capital A, and Greek capital alpha, all of which look
exactly the same.

If we once allow any kind of vaguely stringly-like thing to
include Unicode characters, we are *going* to have the problem
of confusible letters in data.  You could restrict identifiers
to be sequences of a/A characters and we'd still have the
problem in data.

Of all places, the very topmost *safest* place to have the
problem is in Erlang variable names, because of the singleton
style check.  The next safest is probably in function names.
These are places where the compiler will _tell_ us if things
do not match up.

Suppose someone writes

Ο_Φόβος = ο_φονιάς(του_μυαλού)

Yes, the Ο and ο will look like an O and an o, so someone
_could_ trick you.  But they won't be TRYING to.  And if they
_do_ type too much with the wrong keyboard set (as I did while
typing this!) the compiler will tell them.

And all this cowering in fear at the very time that we're seeing
more and more type checking in Erlang, checking that would quite
certainly catch such mistakes very well.  Makes you wonder about
people, really it does.

[The example is as close as I could get to 'Fear is the mind-killer'.]




More information about the erlang-questions mailing list