Language change proposal
Richard A. O'Keefe
ok@REDACTED
Wed Nov 5 02:25:29 CET 2003
I wrote:
> By the way, the Unicode book spells out clear, simple, and usable rules
> for identifier syntax.
Joachim Durchholz <joachim.durchholz@REDACTED> replied:
Ah, wonderful.
Do you have a URL, or a set of promising Google keywords?
Well, it doesn't take the brain of a Feynman to figure out that
the Unicode book is the best place to look, or failing that, www.unicode.org.
In fact it's Section 5.15 "Identifiers" in the Unicode 4.0 book,
and a draft replacement for that section can be found in
http://www.unicode.org/reports/tr31/
"The formal syntax provided here is intended to capture the general
intent that an identifier consists of a string of characters that
begins with a letter or an ideograph, and then includes any number
of letters, ideographs, digits, or underscores. Each programming
language standard has its own identifier syntax; different
programming languages have different conventions for the use of
certain characters from the ASCII range ($, @, #, _) in identifiers.
To extend such a syntax to cover the full behavior of a Unicode
implementation, implementers need only combine these specific rules
with the sample syntax provided here.
Syntactic Rule
<identifier> := <identifier_start>
(<identifier_start> | <identifier_extend>)* "
Since Erlang _doesn't_ use anything other than letters, digits, and
underscores, the Unicode rules would apply exactly.
There are some subtleties to all this concerning normalisation
and the non-breaking format characters, but once you've figured out how
to represent a classification scheme for over a million characters
economically (not, actually, all that hard), the rest is easy.
More information about the erlang-questions
mailing list