[erlang-questions] Pmods, packages, Unicode source code and column numbers in compiler - what will happen in R16?

Thu Oct 18 23:42:18 CEST 2012

On 18/10/2012, at 8:52 PM, Vlad Dumitrescu wrote:
> Thank you for the Unicode treatise, but sometimes one has to read
> between the lines and see what the author meant to say, even if he
> didn't express himself so that it couldn't be used against himself in
> a court of law.
> 
> The underscore as a variable prefix is special for the compiler, as
> warnings for that variable being unused are not emitted. That's why I
> took a character I knew it was unused, just as an example.

The underscore as a variable prefix is not _that_ special for the
compiler.  Again, this is no new thing.  Prolog did the same thing.
You could perfectly well have a policy that

	_<Latin 1 letter or digit>...
		compiler does not warn about singleton use
	_<extended letter>...
		compiler DOES warn about singleton use
	__<extended letter>...
		compiler does NOT warn about singleton use

In any case, if you adopt Unicode identifier syntax, a whole
bunch of extra characters become available:
	U+203F ‿
	U+FE34 ︴
	U+FE4D ﹍
	U+FE4E ﹎
	U+FE4F ﹏
among them.  We could perfectly well say that an identifier beginning
with a Pc character is a variable, which would generalise the current
"_" rule, and the compiler would _not_ be treating those specially.

> 
> Anyway, I find here (http://www.unicode.org/reports/tr31/) that "Each
> programming language can define its identifier syntax as relative to
> the Unicode identifier syntax,

Right.  Already noted, which is why we can continue to allow '@' and '.'

> The actual point of my note is this: there must be a way to make a
> difference between atoms and variables.

We have one right now:
	A variable begins with a character that is in
	(Lu | Lt | Pc) & Latin1.
All we do is remove the restriction to Latin1.  Done!

> Some languages add a marker
> before atoms, some before variables, and Erlang uses the
> capitalization of the first letter. With full unicode, there is no
> clear way to use that rule anymore,

Yes there is.  Keep the existing rule exactly as it stands except
for removing the restriction to Latin1.  Using a script with case?
(*LOTS* of languages.)  You can use an Lu or Lt character.  Using
a script without?  You can use any Pc character you like.  "‿" is
kind of cute.

> so I observed that an alternative could be to do like other languages do.

We don't _need_ an alternative.  Once we break the Latin1 boundary
we have more underscore-like characters to work with, and if you
want to think of them as funny prefixes, good luck to you and
‿日付 will work, but if you want to think of them as differently
shaped underscores, then ‿日付‿今日 will work too (if anyone _wants_ it
to, which is of course another matter).

if you want