[erlang-questions] Erlang basic doubts about String, message passing and context switching overhead

Richard A. O'Keefe ok@REDACTED
Tue Jan 17 07:38:55 CET 2017

On 14/01/17 11:34 AM, Michał Muskała wrote:
> I fully agree there are no languages that deal with strings perfectly.
> That said there are those that are better at it and those that aren't so
> good. A language, where I need to look for a library to upcase or
> downcase my own name, fits into the second group in my book.

Let's see, I use Fortran, C, occasionally C++ or Objective C,
Python, Prolog, Erlang, Lisp, Scheme, R, Ada (when Apple haven't
broken my gcc setup *again*), Smalltalk, Java, C#, SML, F#,
sometimes a bit of JavaScript (yuck).  In *all* of them, if
there are operations to convert case, the operations are in fact
*library* operations.  In *all* of them, I have to look them up.
Many of these languages were defined before Unicode was dreamed
of, and to the extent that they do Unicode casing at all, do it
wrong.  (Case conversion can change the length of a string, for
not very exotic text.)  There is even a script in Unicode were
for several editions of the Unicode standard we were told that
there *were* two cases but one of them was obsolete so that
while you *could* convert, you shouldn't, but then they said
oops, they are actually two scripts, and the one we said
should not be converted to "lower case" actually has *another*
block of characters that really *are* lower case and it's OK
to use those.  (Was it Georgian?  I forget.)

Which of course brings up the points that
  - case conversion for Private Use Area characters may be
    important, but it is by definition not defined by Unicode
  - the Unicode database *changes* frequently; they try to
    keep as much stable as they can, but sometimes case mappings
    for existing characters *do* change, and of course characters
    that were not previously defined may need conversion.

I had a colleague talking to me today who is a D evangelist.
He just loves the language as a better C than C and not
stonkingly bad like C++.  Today he was whingeing that
Unicode is up to version 9.something but D was still only
up to Unicode version 5.1, so the language *purported* to
handle Unicode, but for his purposes, didn't.  He looked
into upgrading it himself (he's serious about liking D),
but the tools they used to convert the Unicode table files
to whatever it is that D uses in the library weren't in the
repository.  (I understand that they are now.)

Don't forget that Unicode has multiple clones of ASCII.
There are for example several differently styled clones
for use in mathematics -- oops, Unicode wasn't supposed to
do that kind of thing, and it's not done for Hebrew and
Greek letters, also used in mathematics -- but *those*
characters shouldn't be case converted.  For example,
in Physics g and G are quite different things.

Here's a thing I would like, and it seems to me the kind
of thing that *should* be in language standards.
I want to generate a number in the user's locale.
How do I get the user's digits?  Last time I looked
there were over 50 copies of the digits 0..9.

So this stuff is insanely complex and changes frequently.
This is EXACTLY the kind of stuff that belongs in libraries,
not language cores.

Note that I am agreeing with the idea of *better* libraries
for Erlang.  Absolutely.  But it's one of those thing where
it's always going to be easier to do something else first:
paint the house, get another PhD, raise a child, ...

More information about the erlang-questions mailing list