[erlang-questions] Erlang basic doubts about String, message passing and context switching overhead

Tue Jan 31 07:49:30 CET 2017

On 2017年1月17日 火曜日 08:17:10 Michał Muskała wrote:
> On 17 Jan 2017, 07:39 +0100, Richard A. O'Keefe <ok@REDACTED>, wrote:
> 
> >
> > Let's see, I use Fortran, C, occasionally C++ or Objective C,
> > Python, Prolog, Erlang, Lisp, Scheme, R, Ada (when Apple haven't
> > broken my gcc setup *again*), Smalltalk, Java, C#, SML, F#,
> > sometimes a bit of JavaScript (yuck). In *all* of them, if
> > there are operations to convert case, the operations are in fact
> > *library* operations. In *all* of them, I have to look them up.
> >
> 
> I just checked and all: Python 3, Ruby, Java, C# (so probably F# as well) and JavaScript properly uppercase the character "ł" using the built-in functions, without any additional libraries.

Those libraries may upper/lower your name, but none of them have any clue about the equivalence of

32
三十二
参拾弐
３２

Not to mention REALLY EASY one-for-one stuff like

Nihon
NIHON
にほん
ニホン
ﾆﾎﾝ

The equivalent of length() doesn't even come out the same over the last three in some langauges. And which of those is "upper case"? And why don't we have a ready conversion for them? And why must にっぽん be 4 "characters" in hirgana but convert to 5 in halfwidth katakana ﾆｯﾎﾟﾝ but still 4 in full-width katakana ニッポン.

etc.

In almost every different programming language/environment I use I am required to write some conversion utilities that cover the gazillion direct, "simple" conversions that exist for my primary language -- and Japanese users are actually used to expecting less out of life and have simply come to accept that computers cannot provide an interface that approximates the language in use.

In a few cases a wrapper to an input method library can help (Anthy and mozc libs, for example, have almost all of this covered already, but only for Japanese) but often it is just about as hard to write wrappers as it is to just write conversions in the language being used (and same-language code is dramatically easier for another programmer to understand and extend later on when something changes).

While unicode libs often cover many of the latinish cases they simply ignore any instances where "case" is not a binary concept -- and those are the majority in Asia. EVERYTHING is an extended library here, no language or platform has this stuff built in (except for platforms from Nintendo, because they are magical unicorns who actually pay for enormous amounts of refinement work). So if it isn't going to handle ANY proper text conversions for the half of the world that lives outside the Western(ish) language sphere, why should built-ins be expected to be aware of >127 conversions at all if it is only going to ever cover a tiny fraction of them?

Don't misconstrue this as an argument for "fairness" -- whatever that would even mean. I am arguing for a lack of surprise. It is surprising to me that there is this idea of "upper" and "lower" case conversions that are implicit and built-in, but that there is no concept of script casting in general when it is the norm in so many non-European languages. Functions like upper() and lower() are just special cases of script casting and only make sense in the context of Latin/Greek(ish) alphabets.

A more general function like scriptcast(String, ScriptName) is actually the real solution that covers all of these. The algorithmic complexity of such a function is close to zero, but the number of cases that must be hand- or rule-written is enormous. I don't understand arguing for a handful of non-trivial ones and then totally ignoring the vast sea of other equally necessary conversions.

Blah blah blah...

Anything more than an upper()/lower() that covers codepoints <128 should never be included in a standard library. upper()/lower() are necessary for case-insensitive string-based instruction standards like HTTP/1.1 (Whose freaking idea was it to make header labels case-insensitive anyway?!?). Outside of that I just can't see anything other than an external lib that implements a scriptcast/2,3 and comes bundled with a neverending river of conversion clauses.

-Craig