[erlang-questions] A proposal for Unicode variable and atom names in Erlang.
Richard O'Keefe
ok@REDACTED
Fri Oct 26 05:32:45 CEST 2012
On 22/10/2012, at 11:45 PM, Yurii Rashkovskii wrote:
> Also, consider this: there are characters that look the same but encoded differently.
You did read the part of the proposal that said to normalise?
> We don't need to go far: the colon character.
Which is not used in variable names or unquoted atoms,
and is therefore outside the scope of the proposal.
> Lets suppose Erlang's "native" colon is U+003A. Now, there is at least three characters that look very similar: U+A789, U+2236, U+05C3. Now you can produce a code that will confuse the hell out of you. Which colon is the right colon?
U+003A = COLON
Hard to believe that Erlang was invented in Sweden, isn't it?
From the Wikipedia page on 'Colon_(punctuation)':
Word-medial separator
In Finnish and Swedish, the colon can appear inside words
in a manner similar to the apostrophe in the English
possessive case, connecting a grammatical suffix to an
abbreviation or initialism, a special symbol, or a digit
(e.g., Finnish USA:n and Swedish USA:s for the genitive
case of "USA", Finnish %:ssa for the inessive case of "%",
or Finnish 20:een for the illative case of "20").
Abbreviation
In Swedish, the colon is used in contractions, such as S:t
for Sankt (Swedish for "Saint"), e.g., in the Stockholm metro
station S:t Eriksplan. This can even occur in people's names,
for example Antonia Ax:son Johnson (Ax:son forAxelson). The
colon was also used to mark abbreviations in early modern English.
U+05C3 = HEBREW PUNCTUATION SOF PASUQ (end-of-"verse" cantillation mark)
If people start incorporating portions of the Torah in their
Erlang code and notating the whole for chanting, we could be in
real trouble. Until then, not.
It's in class 'Po', which the proposal before us doesn't use.
It would remain *illegal* in Erlang.
U+2236 = RATIO (mathematics)
I've lived my whole educated life not distinguishing this in any
way from plain old colon; it's not clear to me what if anything
would stop Erlang *actively not caring* which one you use.
It's in class 'Sm', which the proposal before us doesn't use.
It would remain *illegal* in Erlang.
U+A789 = MODIFIER LETTER COLON.
This one looks tricky. The Wikipedia lists about a dozen
languages that use this to indicate tone. UAX#31 says
Modifier letters (General_Category=Lm) are also
included in the definition of the syntax classes
for identifiers. Modifier letters are often part
of natural language orthographies and are useful
for making word-like identifiers in formal languages.
On the other hand, modifier symbols
(General_Category=Sk), which are seldom a part of
language orthographies, are excluded from identifiers.
So does the proposal before us require, for the sake of Budu,
foo꞉bar as an identifier?
Actually, NO. It's *called* "MODIFIER LETTER COLON", but
it is *classified* as a modifier symbol (Sk), and as such,
explicitly excluded from Unicode identifiers. It's not
currently used for anything in Erlang, so
It would remain *illegal* in Erlang.
In short, of the four colon-like characters mentioned,
there is one and only one which would be allowed in Erlang
by the proposal before us.
This is just FUD.
The "colon problem" is NOT GOING TO HAPPEN. The sky is still not falling.
More information about the erlang-questions
mailing list