[erlang-questions] Pmods, packages, Unicode source code and column numbers in compiler - what will happen in R16?

Thu Oct 18 09:52:57 CEST 2012

Hi Richard,

On Thu, Oct 18, 2012 at 1:34 AM, Richard O'Keefe <ok@REDACTED> wrote:
> On 17/10/2012, at 8:11 PM, Vlad Dumitrescu wrote:
>> On Wed, Oct 17, 2012 at 1:22 AM, Richard O'Keefe <ok@REDACTED> wrote:
>>> "Variable names will continue to be limited to Latin characters."
>>>
>>>        I hope that means "for this release."
>>
>> That's an interesting problem. Variable names are defined as starting
>> with an upper case letter, but the only scripts that I know of that
>> have those are roman, greek, cyrillic and armenian.
>
> "Variables are defined as starting with an upper case letter"
> isn't exactly true, unless you do what Quintus did back in the
> 80s and redefine "_" from being a 'punctuation connector' to
> an 'upper case letter'.  Quintus did that for CJK, so that 日付
> was an unquoted atom and _日付 was a Prolog variable.
> This was apparently acceptable, and the same practice is followed in
> other Prologs.  I see no reason why it would not work for Erlang,
> where _1 is a perfectly good variable.

Thank you for the Unicode treatise, but sometimes one has to read
between the lines and see what the author meant to say, even if he
didn't express himself so that it couldn't be used against himself in
a court of law.

The underscore as a variable prefix is special for the compiler, as
warnings for that variable being unused are not emitted. That's why I
took a character I knew it was unused, just as an example.

Anyway, I find here (http://www.unicode.org/reports/tr31/) that "Each
programming language can define its identifier syntax as relative to
the Unicode identifier syntax, such as saying that identifiers are
defined by the Unicode properties, with the addition of “$”. By
addition or subtraction of a small set of language specific
characters, a programming language standard can easily track a growing
repertoire of Unicode characters in a compatible way.", so I see no
problems with '§'. Besides that, I would see this initial character
not as part of the identifier, but as a marker for "here comes a
variable name", just as '?' is used for macros and '#' for records.

The actual point of my note is this: there must be a way to make a
difference between atoms and variables. Some languages add a marker
before atoms, some before variables, and Erlang uses the
capitalization of the first letter. With full unicode, there is no
clear way to use that rule anymore, so I observed that an alternative
could be to do like other languages do. The exact form is mostly
irrelevant.

best regards,
Vlad