[erlang-questions] A proposal for Unicode variable and atom names in Erlang.

Richard O'Keefe ok@REDACTED
Fri Oct 26 05:32:45 CEST 2012


On 22/10/2012, at 11:45 PM, Yurii Rashkovskii wrote:

> Also, consider this: there are characters that look the same but encoded differently.

You did read the part of the proposal that said to normalise?

> We don't need to go far: the colon character.

Which is not used in variable names or unquoted atoms,
and is therefore outside the scope of the proposal.

> Lets suppose Erlang's "native" colon is U+003A. Now, there is at least three characters that look very similar: U+A789, U+2236, U+05C3. Now you can produce a code that will confuse the hell out of you. Which colon is the right colon?

U+003A = COLON
	Hard to believe that Erlang was invented in Sweden, isn't it?
	From the Wikipedia page on 'Colon_(punctuation)':

	    Word-medial separator

		In Finnish and Swedish, the colon can appear inside words
		in a manner similar to the apostrophe in the English
		possessive case, connecting a grammatical suffix to an
		abbreviation or initialism, a special symbol, or a digit
		(e.g., Finnish USA:n and Swedish USA:s for the genitive
		case of "USA", Finnish %:ssa for the inessive case of "%",
		or Finnish 20:een for the illative case of "20").

	    Abbreviation

		In Swedish, the colon is used in contractions, such as S:t
		for Sankt (Swedish for "Saint"), e.g., in the Stockholm metro
		station S:t Eriksplan.  This can even occur in people's names,
		for example Antonia Ax:son Johnson (Ax:son forAxelson).  The
		colon was also used to mark abbreviations in early modern English.

U+05C3 = HEBREW PUNCTUATION SOF PASUQ (end-of-"verse" cantillation mark)
	If people start incorporating portions of the Torah in their
	Erlang code and notating the whole for chanting, we could be in
	real trouble.  Until then, not.

	It's in class 'Po', which the proposal before us doesn't use.
	It would remain *illegal* in Erlang.

U+2236 = RATIO (mathematics)
	I've lived my whole educated life not distinguishing this in any
	way from plain old colon; it's not clear to me what if anything
	would stop Erlang *actively not caring* which one you use.

	It's in class 'Sm', which the proposal before us doesn't use.
	It would remain *illegal* in Erlang.

U+A789 = MODIFIER LETTER COLON.
	This one looks tricky.  The Wikipedia lists about a dozen
	languages that use this to indicate tone.  UAX#31 says

		Modifier letters (General_Category=Lm) are also 
		included in the definition of the syntax classes
		for identifiers.  Modifier letters are often part
		of natural language orthographies and are useful
		for making word-like identifiers in formal languages.
		On the other hand, modifier symbols
		(General_Category=Sk), which are seldom a part of
		language orthographies, are excluded from identifiers. 


	So does the proposal before us require, for the sake of Budu,
	foo꞉bar as an identifier?

	Actually, NO.  It's *called* "MODIFIER LETTER COLON", but
	it is *classified* as a modifier symbol (Sk), and as such,
	explicitly excluded from Unicode identifiers.  It's not
	currently used for anything in Erlang, so
	It would remain *illegal* in Erlang.

In short, of the four colon-like characters mentioned,
there is one and only one which would be allowed in Erlang
by the proposal before us.

This is just FUD.

The "colon problem" is NOT GOING TO HAPPEN.  The sky is still not falling.




More information about the erlang-questions mailing list